[00:00] https://archive.org/details/EMachinesEtower466ix500i500ixv4.0eMachines2000 [00:00] See, much better. [00:02] ia search "collection:cdbbsarchive" --itemlist | wc -l [00:02] 2768 [00:02] Burp [00:03] OK, it's running against that list. 2,700 items, it'll do all the ones that have a .TIF scan but no cover image. [00:16] very nice [00:19] is it possible to run the same script against any CD with a full-size PNG as the cover photo? The CDs I uploaded all have that, and your cover photo size is much better than the using the full-sized pic (which takes up half the screen) [00:20] *** MMovie has quit IRC (Read error: Operation timed out) [00:21] *** Famicoman has joined #archiveteam [00:22] *** MMovie has joined #archiveteam [00:24] I can. [00:24] I might try it [00:24] I'll do the 2700 first. [00:24] As .TIF [00:24] Then will do .PNG [00:24] That'll do, PNG [00:25] thanks! if you can't, that's fine. [00:32] *** will has quit IRC (Quit: Goodbye) [00:58] *** MMovie has quit IRC (Read error: Operation timed out) [00:58] *** metalcamp has quit IRC (Ping timeout: 252 seconds) [00:59] *** MMovie has joined #archiveteam [01:08] *** JesseW has joined #archiveteam [01:13] *** MMovie has quit IRC (Read error: Operation timed out) [01:14] *** MMovie has joined #archiveteam [01:16] *** kyan has joined #archiveteam [01:19] *** kyan has left [01:20] *** xXx_ndidd is now known as ndiddy [01:29] *** MMovie has quit IRC (Read error: Operation timed out) [01:30] *** MMovie has joined #archiveteam [01:30] *** godane has quit IRC (Read error: Operation timed out) [01:36] *** Stiletto has quit IRC (Ping timeout: 260 seconds) [01:42] *** phuzion has quit IRC (Read error: Operation timed out) [01:48] *** phuzion has joined #archiveteam [01:49] *** godane has joined #archiveteam [01:54] *** MMovie has quit IRC (Read error: Operation timed out) [01:55] *** MMovie has joined #archiveteam [01:58] *** philpem has quit IRC (Ping timeout: 260 seconds) [02:04] *** wp494 has quit IRC (Read error: Connection reset by peer) [02:05] *** JesseW has quit IRC (Quit: Leaving.) [02:10] *** MMovie has quit IRC (Read error: Operation timed out) [02:12] *** MMovie has joined #archiveteam [02:21] *** wp494 has joined #archiveteam [02:38] *** megaminxw has joined #archiveteam [02:40] *** MMovie has quit IRC (Read error: Operation timed out) [02:42] *** MMovie has joined #archiveteam [02:45] *** Stiletto has joined #archiveteam [02:55] *** MMovie has quit IRC (Read error: Operation timed out) [02:56] *** godane has quit IRC (Quit: Leaving.) [02:56] *** godane has joined #archiveteam [02:57] *** MMovie has joined #archiveteam [03:03] *** xXx_ndidd has joined #archiveteam [03:04] *** snape has quit IRC (Hey! Where'd my controlling terminal go?) [03:09] *** MMovie has quit IRC (Read error: Operation timed out) [03:10] *** ndiddy has quit IRC (Ping timeout: 492 seconds) [03:11] *** MMovie has joined #archiveteam [03:15] *** megaminxw has quit IRC (Quit: Leaving.) [03:17] *** snape has joined #archiveteam [03:20] It's hard. I will not do it automatically. [03:21] okay- thanks! [03:26] *** MMovie has quit IRC (Read error: Operation timed out) [03:27] *** MMovie has joined #archiveteam [03:43] *** BubuAnabe has joined #archiveteam [03:44] *** MMovie has quit IRC (Read error: Operation timed out) [03:45] *** MMovie has joined #archiveteam [04:02] *** MMovie has quit IRC (Read error: Operation timed out) [04:03] *** MMovie has joined #archiveteam [04:13] *** vitzli has joined #archiveteam [04:13] *** JetBalsa has joined #archiveteam [04:35] *** MMovie has quit IRC (Read error: Operation timed out) [04:36] *** MMovie has joined #archiveteam [04:54] *** MMovie has quit IRC (Read error: Operation timed out) [04:55] *** MMovie has joined #archiveteam [05:04] *** JesseW has joined #archiveteam [05:06] *** JetBalsa has quit IRC (Read error: Connection reset by peer) [05:12] *** MMovie has quit IRC (Read error: Operation timed out) [05:13] *** MMovie has joined #archiveteam [05:18] *** bwn has quit IRC (Read error: Operation timed out) [05:25] *** megaminxw has joined #archiveteam [05:32] *** Sk1d has quit IRC (Ping timeout: 200 seconds) [05:37] *** Sk1d has joined #archiveteam [05:43] *** MMovie has quit IRC (Read error: Operation timed out) [05:45] *** MMovie has joined #archiveteam [05:56] *** WinterFox has joined #archiveteam [06:02] *** MMovie has quit IRC (Read error: Operation timed out) [06:04] *** MMovie has joined #archiveteam [06:22] *** MMovie has quit IRC (Read error: Operation timed out) [06:23] *** MMovie has joined #archiveteam [06:40] *** MMovie has quit IRC (Read error: Operation timed out) [06:41] *** MMovie has joined #archiveteam [06:58] TIL that archive.org doesn't prevent users from uploading files containing newlines. :-( For example, https://archive.org/details/FarroEFuocoNessunaFacciaBuonaPulitaEGiustaAEXPO2015 [06:58] *** superkuh has quit IRC (Read error: Connection reset by peer) [06:58] *** MMovie has quit IRC (Read error: Operation timed out) [06:59] *** MMovie has joined #archiveteam [07:07] *** superkuh has joined #archiveteam [07:12] o.O [07:12] yeah [07:12] I don't (yet) have an example of embedded tabs (or nulls) -- but I wouldn't be surprised to find one... [07:13] I had similar issues with jq output [07:14] I don't remember any nulls, but several \n files were there - broke my md5 list, maybe \t were there too [07:15] how did you solve it? [07:16] I think I just dropped them [07:17] *** lbft_ has quit IRC (Ping timeout: 633 seconds) [07:17] or sed'ed \ns [07:17] hm [07:19] I think using the jq @uri feature works [07:19] I think I parsed the resulting file dropping all the lines that were not matching "md5 *itemname/filename" pattern [07:19] heh [07:20] I would still have md5 and item id - quite enough to find the file [07:23] *** JesseW has quit IRC (Quit: Leaving.) [07:24] *** GLaDOS has quit IRC (Ping timeout: 260 seconds) [07:25] *** schbirid has joined #archiveteam [07:28] I suspect that it happens because of --raw-output option [07:30] *** MMovie has quit IRC (Read error: Operation timed out) [07:30] *** MMovie has joined #archiveteam [07:39] *** lbft has joined #archiveteam [07:53] *** MMovie has quit IRC (Read error: Operation timed out) [08:01] *** MMovie has joined #archiveteam [08:19] *** MMovie has quit IRC (Read error: Operation timed out) [08:19] *** MMovie has joined #archiveteam [08:22] *** metalcamp has joined #archiveteam [08:36] *** MMovie has quit IRC (Read error: Operation timed out) [08:37] *** MMovie has joined #archiveteam [08:55] *** MMovie has quit IRC (Read error: Operation timed out) [08:56] *** MMovie has joined #archiveteam [08:58] *** bwn has joined #archiveteam [09:08] *** MMovie has quit IRC (Read error: Operation timed out) [09:09] *** MMovie has joined #archiveteam [09:25] *** metalcamp has quit IRC (Ping timeout: 252 seconds) [09:45] *** MMovie has quit IRC (Read error: Operation timed out) [09:46] *** MMovie has joined #archiveteam [09:49] *** dxrt- sets mode: +o dxrt [09:59] *** MMovie has quit IRC (Read error: Operation timed out) [09:59] *** MMovie has joined #archiveteam [10:11] *** will has joined #archiveteam [10:16] *** MMovie has quit IRC (Read error: Operation timed out) [10:17] *** MMovie has joined #archiveteam [10:32] *** MMovie has quit IRC (Read error: Operation timed out) [10:33] *** MMovie has joined #archiveteam [10:51] *** MMovie has quit IRC (Read error: Operation timed out) [10:52] *** MMovie has joined #archiveteam [11:10] *** MMovie has quit IRC (Read error: Operation timed out) [11:11] *** MMovie has joined #archiveteam [11:28] *** MMovie has quit IRC (Read error: Operation timed out) [11:29] *** MMovie has joined #archiveteam [11:46] *** MMovie has quit IRC (Read error: Operation timed out) [11:48] *** MMovie has joined #archiveteam [12:06] *** MMovie has quit IRC (Read error: Operation timed out) [12:06] SketchCow: I've switched FOS back on for fotolog and friendsreunited [12:07] *** MMovie has joined #archiveteam [12:07] Thanks [12:07] We'll see how it holds up [12:08] ok [12:08] Fketcher will sync the data from his temporary rsync target to FOS [12:08] Fletcher* [12:10] SketchCow: If you have some time, please let me know what you think of the videobot idea. Scripts are partially written, but writing is paused until the project is approved [12:22] *** MMovie has quit IRC (Read error: Operation timed out) [12:24] *** MMovie has joined #archiveteam [12:37] *** MMovie has quit IRC (Read error: Operation timed out) [12:39] *** MMovie has joined #archiveteam [12:57] *** MMovie has quit IRC (Read error: Operation timed out) [12:58] *** MMovie has joined #archiveteam [13:03] *** SadDM has joined #archiveteam [13:07] *** megaminxw has quit IRC (Quit: Leaving.) [13:18] *** WinterFox has quit IRC (Remote host closed the connection) [13:27] *** MMovie has quit IRC (Read error: Operation timed out) [13:29] *** MMovie has joined #archiveteam [13:41] If anyone has some ideas for videobot, please let me know [13:41] I'm thinking of adding some system so people can add their own periodical scraping jobs to videobot [13:42] Add the URL, let videobot know what needs to be extracted. Let videobot know how to extract metadata. How often this should be done [13:42] And then videobot will upload as video/audio/text item to IA and as WARC into a pack [13:42] Wil be nice of PDF newspapers downloads for example [13:43] Or create a job to extract new politics videos periodically [13:44] what about creating thumbnails of each video [13:45] Can be added, but IA does that already for new videos right? [13:46] * Fletcher nods [13:46] not sure of the quality though (if you're looking for something really nice) [13:47] arkiver: add a -follow feature, grabbing the entire channel and updated vid's when they air [13:48] and a -live feature, grabbing stream + comments, could be handy to weave out any censorship [13:48] That would be periodical scraping of a youtube/dailymotion/etc. channels? [13:48] stream+comments is a nice idea [13:48] maybe we should scoot over to -bs or #videobot? [13:49] let's do that [13:49] #videobot for videobot discussions! [13:55] *** philpem has joined #archiveteam [14:04] *** MMovie has quit IRC (Read error: Operation timed out) [14:05] *** MMovie has joined #archiveteam [14:15] *** jut has joined #archiveteam [14:16] Please do not join #videobot yet! We need everyone out so we can get ops back [14:20] lol [14:21] *** MMovie has quit IRC (Read error: Operation timed out) [14:21] related: any chance of moving archiveteam anywhere but efnet? :P [14:23] *** MMovie has joined #archiveteam [14:40] *** MMovie has quit IRC (Read error: Operation timed out) [14:42] *** MMovie has joined #archiveteam [14:43] *** scyther has joined #archiveteam [14:59] *** MMovie has quit IRC (Read error: Operation timed out) [15:00] *** MMovie has joined #archiveteam [15:14] *** MMovie has quit IRC (Read error: Operation timed out) [15:15] *** MMovie has joined #archiveteam [15:16] *** Start has quit IRC (Quit: Disconnected.) [15:22] arkiver, give the signal when we can return [15:27] everyone is out [15:28] do we have to wait for some time? or can we reoin immediatly [15:28] come in [15:29] Everyone can join #videobot again! [15:37] *** MMovie has quit IRC (Read error: Operation timed out) [15:38] *** MMovie has joined #archiveteam [15:48] *** Start has joined #archiveteam [15:55] *** MMovie has quit IRC (Read error: Operation timed out) [15:57] *** MMovie has joined #archiveteam [16:15] *** MMovie has quit IRC (Read error: Operation timed out) [16:16] *** MMovie has joined #archiveteam [16:23] *** Start_ has joined #archiveteam [16:23] *** Start has quit IRC (Read error: Connection reset by peer) [16:23] *** Start_ is now known as Start [16:24] *** MMovie has quit IRC (Read error: Operation timed out) [16:25] *** MMovie has joined #archiveteam [16:34] *** Stiletto has quit IRC (Read error: Connection reset by peer) [16:37] *** MMovie has quit IRC (Read error: Operation timed out) [16:38] *** MMovie has joined #archiveteam [16:41] *** JesseW has joined #archiveteam [16:54] *** MMovie has quit IRC (Read error: Operation timed out) [16:55] *** MMovie has joined #archiveteam [16:58] *** Tomcat_ has joined #archiveteam [17:05] *** Zei-Pii has joined #archiveteam [17:07] *** Start has quit IRC (Quit: Disconnected.) [17:12] *** JesseW has quit IRC (Quit: Leaving.) [17:23] *** MMovie has quit IRC (Read error: Operation timed out) [17:24] *** MMovie has joined #archiveteam [17:30] *** Stiletto has joined #archiveteam [17:39] arkiver: Is that fixed, or do you still need more targets? [17:40] (IRC is not one of my primary communication channels, so latency can get high...) [17:40] *** Emcy has joined #archiveteam [17:41] *** MMovie has quit IRC (Read error: Operation timed out) [17:41] *** MMovie has joined #archiveteam [17:45] *** BubuAnabe has quit IRC (Ping timeout: 633 seconds) [17:54] Not a single chance of moving archiveteam from EFNet [17:54] Zero [17:55] *** MMovie has quit IRC (Read error: Operation timed out) [17:55] *** MMovie has joined #archiveteam [17:59] *** vitzli has quit IRC (Leaving) [18:05] *** dashcloud has quit IRC (Read error: Operation timed out) [18:09] *** dashcloud has joined #archiveteam [18:13] <3 [18:15] *** scyther has quit IRC (Read error: Connection reset by peer) [18:15] *** Tomcat_ has quit IRC (Ping timeout: 633 seconds) [18:16] *** Zei-Pii| has joined #archiveteam [18:19] *** Zei-Pii has quit IRC (Read error: Operation timed out) [18:22] *** Tomcat_ has joined #archiveteam [18:33] worth1000 is closing down in in a week [18:33] *** bwn has quit IRC (Ping timeout: 246 seconds) [18:33] *** bithippo has joined #archiveteam [18:34] What! [18:34] https://gist.github.com/espes/9c5e8eceb3749c674b5f [18:35] looks like migration is opt out at least [18:36] feels like every time a site does a migration old stuff gets disappeared ¯\_(ツ)_/¯ [18:42] *** Tomcat_ has quit IRC (Read error: Connection reset by peer) [18:47] *** MMovie has quit IRC (Read error: Operation timed out) [18:47] *** MMovie has joined #archiveteam [19:03] *** bwn has joined #archiveteam [19:04] *** bithippo has quit IRC (Quit: Page closed) [19:13] *** Stiletto has quit IRC (Read error: Connection reset by peer) [19:14] *** Stiletto has joined #archiveteam [19:22] *** MMovie has quit IRC (Read error: Operation timed out) [19:23] *** MMovie has joined #archiveteam [19:24] *** Tomcat_ has joined #archiveteam [19:25] *** Start has joined #archiveteam [19:36] *** MMovie has quit IRC (Read error: Operation timed out) [19:36] *** MMovie has joined #archiveteam [19:38] *** lytv has quit IRC (Quit: Leaving) [19:46] *** lytv has joined #archiveteam [20:06] *** MMovie has quit IRC (Read error: Operation timed out) [20:08] *** MMovie has joined #archiveteam [20:10] *** scyther has joined #archiveteam [20:11] *** MMovie has quit IRC (Read error: Operation timed out) [20:12] *** MMovie has joined #archiveteam [20:30] *** MMovie has quit IRC (Read error: Operation timed out) [20:32] *** MMovie has joined #archiveteam [20:34] *** VADemon has joined #archiveteam [20:45] *** Start has quit IRC (Quit: Disconnected.) [21:00] *** aMunster has joined #archiveteam [21:05] *** MMovie has quit IRC (Read error: Operation timed out) [21:06] *** MMovie has joined #archiveteam [21:21] *** MMovie has quit IRC (Read error: Operation timed out) [21:22] *** megaminxw has joined #archiveteam [21:23] *** MMovie has joined #archiveteam [21:25] *** jut has quit IRC (jut) [21:26] *** antomati_ has joined #archiveteam [21:28] *** schbirid has quit IRC (Ping timeout: 258 seconds) [21:29] *** antomatic has quit IRC (Ping timeout: 258 seconds) [21:30] *** megaminxw has quit IRC (west.us.hub irc.Prison.NET) [21:30] *** Chorca has quit IRC (west.us.hub irc.Prison.NET) [21:30] *** gibigiana has quit IRC (west.us.hub irc.Prison.NET) [21:30] *** achip has quit IRC (west.us.hub irc.Prison.NET) [21:31] *** scyther has quit IRC (Read error: Connection reset by peer) [21:37] *** schbirid has joined #archiveteam [21:37] *** WinterFox has joined #archiveteam [21:40] *** Chorca1 has joined #archiveteam [21:40] *** Tomcat_ has quit IRC (Remote host closed the connection) [21:49] *** gibigiana has joined #archiveteam [21:58] *** MMovie has quit IRC (Read error: Operation timed out) [21:59] *** MMovie has joined #archiveteam [22:05] !a http://jmcauley.ucsd.edu/data/amazon/ [22:06] wrong chan, ive thrown it in the right one [22:06] oosps [22:06] hmm, it's now being grabbed twice [22:20] arkiver: See personal message for new rsync modules. [22:33] *** RedType has left [22:52] *** Zei-Pii| has quit IRC (Read error: Connection reset by peer) [23:12] *** MMovie has quit IRC (Read error: Operation timed out) [23:13] *** MMovie has joined #archiveteam [23:25] *** MMovie has quit IRC (Read error: Operation timed out) [23:26] *** MMovie has joined #archiveteam [23:32] *** tomwsmf-a has joined #archiveteam [23:33] *** dashcloud has quit IRC (Read error: Operation timed out) [23:34] *** dashcloud has joined #archiveteam [23:41] *** MMovie has quit IRC (Read error: Operation timed out) [23:43] *** MMovie has joined #archiveteam [23:43] *** Start has joined #archiveteam [23:57] *** MMovie has quit IRC (Read error: Operation timed out) [23:59] *** MMovie has joined #archiveteam