US 12,287,620 B2
	Machine learning device, numerical control system, setting device, numerical control device, and machine learning method
Yoshiyuki Suzuki, Yamanashi (JP)
Assigned to FANUC CORPORATION, Yamanashi (JP)
Appl. No. 17/802,420
Filed by FANUC CORPORATION, Yamanashi (JP)
PCT Filed Mar. 10, 2021, PCT No. PCT/JP2021/009488 § 371(c)(1), (2) Date Aug. 25, 2022, PCT Pub. No. WO2021/187268, PCT Pub. Date Sep. 23, 2021.
Claims priority of application No. 2020-046070 (JP), filed on Mar. 17, 2020.
Prior Publication US 2023/0083761 A1, Mar. 16, 2023
Int. Cl. G05B 19/4155 (2006.01); G05B 19/4065 (2006.01)

CPC G05B 19/4155 (2013.01) [G05B 19/4065 (2013.01); G05B 2219/37346 (2013.01); G05B 2219/40499 (2013.01); G05B 2219/50308 (2013.01)]

10 Claims

1. A machine learning device configured to perform machine learning with respect to a numerical control device configured to cause a machine tool to operate based on a machining program, the machine learning device comprising:

a memory that stores program; and

a processor configured to execute the program and control the machine learning device to:

acquire, as the numerical control device executes the machining program set with at least a cutting amount for one time and a cutting rate and causes the machine tool to perform cutting work, state information including the cutting amount for one time and the cutting rate;

output action information including adjustment information for the cutting amount for one time and the cutting rate included in the state information;

acquire determination information that is information regarding at least a magnitude of pressure applied to a tool during the cutting work, a shape of a waveform of the pressure applied to the tool, and a period of time taken for the cutting work, and, based on the determination information that has been acquired, to output a reward value used in reinforcement learning depending on a predetermined condition; and

update a value function based on the reward value, the state information, and the action information,

wherein the predetermined condition is either of a condition for prioritizing machining time and a condition for prioritizing lifetime of the tool,

the processor outputs a first reward value under the condition for prioritizing machining time and outputs a second reward value under the condition for prioritizing the lifetime of the tool, and

the processor updates a first value function based on the first reward value, the state information, and the action information under the condition for prioritizing machining time and updates a second value function based on the second reward value, the state information, and the action information under the condition for prioritizing the lifetime of the tool.