[00:00] *** fie has quit IRC (Ping timeout: 268 seconds) [00:18] *** Asparagir has quit IRC (Asparagir) [00:27] *** BlueMax has joined #archiveteam-bs [00:28] *** Mateon1 has quit IRC (Remote host closed the connection) [00:38] *** Mateon1 has joined #archiveteam-bs [00:40] *** Jusque has joined #archiveteam-bs [00:44] *** Jusque_ has joined #archiveteam-bs [00:47] *** fie has joined #archiveteam-bs [00:54] *** Jusque has quit IRC (Read error: Operation timed out) [00:54] *** Jusque_ is now known as Jusque [01:11] *** kisspunch has quit IRC (Ping timeout: 260 seconds) [01:14] *** kisspunch has joined #archiveteam-bs [02:06] *** ta9le has quit IRC (Quit: Connection closed for inactivity) [02:16] *** BlueMax has quit IRC (Leaving) [02:54] *** BlueMax has joined #archiveteam-bs [02:57] *** Sk1d has quit IRC (Read error: Operation timed out) [03:00] *** Sk1d has joined #archiveteam-bs [03:19] *** qw3rty115 has joined #archiveteam-bs [03:22] *** odemg has quit IRC (Ping timeout: 260 seconds) [03:23] *** qw3rty114 has quit IRC (Read error: Operation timed out) [03:34] *** odemg has joined #archiveteam-bs [04:36] *** Asparagir has joined #archiveteam-bs [04:37] *** svchfoo1 sets mode: +o Asparagir [04:39] so, my fanfic dumps got dmca'd. [04:39] just ffnet not ao3 [04:51] bsmith093: which item? I can still access https://archive.org/download/fanfictiondotnet_repack and https://archive.org/download/Fanfictiondotnet1011dump fine. [04:52] those and the smaller 100k-links-at-a-time ones. [04:52] should i just keeps going? [04:52] *keep [04:52] Do you have them somewhere else? [04:53] not really, once i found out i grabbed them asap, where's a good place? [05:19] SkethcCow: alot of items recently upload got dark : https://archive.org/details/@seanpaulk [05:19] there [05:20] all i got from one of the items is this for why its dark: Darked using /home/jake/scripts/lock_delete_darke_user.py [05:21] https://catalogd.archive.org/log/900209219 [05:21] maybe the malware script darked them but not sure [05:23] for all the items that got dark: http://archive.org/metamgr.php?&w_uploader=seanpaulk@hotmail.com&mode=more [05:24] even the guys fav-seanpaulk collection got dark [05:27] godane: Is there a way for me to get a list of what items of mine are darked? [05:28] using your email address for you account yes [05:29] I tried putting my address in there and still got "not authorized to access this service" [05:30] what is your user name? [05:30] hook54321a, dm me what it says is darked though. [05:35] my account size: size: 131,113,480,968 KB [05:36] 131.1TB [05:37] What exactly is the Meta-manager used for? [05:38] i think maybe you can see it cause your not admin [05:38] anyways i can see your items [05:38] http://archive.org/metamgr.php?&w_uploader=hook54321a@gmail.com&mode=more [05:39] anyways you got 10 items that dark [05:40] 2 are in test_collectrion [05:40] *test_collection [05:51] my full item count include darked ones is : 1,416,872 [05:55] *** DragonMon has joined #archiveteam-bs [06:07] *** Mateon1 has quit IRC (Read error: Operation timed out) [06:07] *** Mateon1 has joined #archiveteam-bs [06:10] *** Asparagir has quit IRC (Asparagir) [07:41] *** omglolbah has quit IRC (Ping timeout: 268 seconds) [09:46] *** ta9le has joined #archiveteam-bs [09:49] *** Sk1d has quit IRC (Read error: Operation timed out) [09:50] *** Sk1d has joined #archiveteam-bs [10:09] *** elise has joined #archiveteam-bs [10:12] *** elise has quit IRC (Quit: leaving) [10:13] *** elise has joined #archiveteam-bs [10:13] 08:59 < elise> Hello archivers ! [10:13] 08:59 < elise> I have a question regarding youtube content [10:13] 09:00 < elise> more specifically metadata, not the videos content per se [10:13] 09:00 < elise> I saw that there is an ongoing effort regarding the videos, but is there something similar for metadata ? [10:13] 09:01 < elise> What are the tools used for the organization of metadata ? [10:16] would you like to start a project to archive all of youtube's metadata? I can provide design assistance [10:16] it's just a few billion public videos so it should fit on a server [10:20] well, maybe more than one if you're grabbing video descriptions and comments and such [10:22] keeping track of all this metadata would help a lot with figuring out what to archive and keeping track of unlisted videos [10:30] *** omglolbah has joined #archiveteam-bs [10:33] Sorry, I have not been clear : I am not interested in the video, only the metadata : channels, playlist, comments, descriptions, timestamps [10:34] If nothing exists, I would like to do something on my own [10:34] If something exists, I am curious about the implementation [10:34] I haven't heard of anyone archiving youtube metadata [10:35] ok [10:36] my raw idea would be to actually organize everything and be able to request ressource the same way you request YT API [10:36] is the youtube API any good? I would want to just run SQL queries against the database [10:37] I am thinking about mirroring kind of so that the same code can be used against the archive [10:38] elise: Like, metadata of the videos? [10:38] yes [10:38] *** hook54321 sets mode: +o ivan [10:38] well, it s broader, it's any ressources that are not videos [10:39] tubeup (a comoon set of scripts that people oftentimes use to upload videos to archive.org) uploads at least some metadata relating to videos. [10:44] I already have some bits of code to request ressources from the API, I am currently thinking in how to organize all of this to query it easily [10:45] also, wanted your input as archivers on my idea of mirroring [10:45] I don't know if you would use this kind of solution to archive database that are accessible by a public API [11:02] *** DragonMon has quit IRC (Quit: Leaving) [11:05] what kind of rate limits do you get on their public API? [11:37] arkiver: I’ll stop my grab then? [11:37] PurpleSym: maybe wait until this project is succesful [11:37] Ok. [11:37] in case something goes wrong [11:37] My last estimate was ~5 TB, btw. [11:37] 5 TB is fine too [11:39] all numbers before the URLs give me 722417136690 [11:39] which is ~722GB [11:39] assuming the number in front are sizes? [11:40] No, the first number in audiourls.txt is the original song ID. [11:40] ah oops [11:41] but 5 TB should be fine too, yeah [11:41] I’m not sure whether filenames are unique, so I’m saving to {id}.{ext} [11:41] I see [11:44] How many files did you combine into one item, arkiver ? [11:45] 5 URLs/item [11:45] all items are in the tracker now [11:49] Any chance I can get access to the rsync target once the grab finished? Moving 5 TB from IA to Europe and back is quite painful. [11:49] it's already going to IA now [11:49] to FOS [11:50] Yeah, would be easier if I extracted the WARCs directly on FOS to upload individual items. (If that’s still something we want.) [11:51] I could let a script run on IA servers to process the WARCs when they are in IA [11:51] as in when they are in items on IA [11:51] Sure, whatever works. [11:52] I think we can sort something out [11:52] Are you an IA employee nowadays, arkiver ? [11:54] yep [12:05] *** BlueMax has quit IRC (Read error: Connection reset by peer) [12:07] *** BlueMax has joined #archiveteam-bs [12:08] *** DragonMon has joined #archiveteam-bs [12:39] *** elise has quit IRC (hub.efnet.us irc.efnet.nl) [12:39] *** godane has quit IRC (hub.efnet.us irc.efnet.nl) [12:43] *** elise_ has joined #archiveteam-bs [13:01] *** godane has joined #archiveteam-bs [13:01] *** svchfoo1 sets mode: +o godane [13:12] *** schbirid has joined #archiveteam-bs [14:00] \ivan 1 million read per day [14:05] *** SilSte has joined #archiveteam-bs [14:16] What do you think of archiving websites where images can be stored? [14:16] For example website used to images on forums, etc. [14:16] Also for websites that are not currently in dange [14:16] danger* [14:19] It would be great if it happened. [14:20] I know most of the free services used delete images after a while. [14:50] *** BlueMax has quit IRC (Leaving) [15:09] PurpleSym: I'm uploading my downloads to https://archive.org/download/tyzoid-acidplanet-audio [15:10] Ok, tyzoid. arkiver: ↑ [15:20] arkiver / tapedrive: There was that one image hosting service that was really popular ~2002, which expires images after a while [15:20] can't remember the name of the service, but I think it's a good idea - especially on internet message boards [15:21] blogs / etc, I'm not as concerned with the images, since they tend to be hosted first-party. [15:21] on a forum, it's almost always 3rd party [15:23] Photobucket? [15:24] I think so [15:31] PurpleSym: Do you know if there's a way to get wget to remove the leftover files once they've been added to the warc? [15:32] or is it a matter of just pausing the download, cleaning it up, and restarting it? [15:32] Not sure, -O /dev/null perhaps? [15:34] Maybe, we'll see. Btw, I'm uploading in 1gb warc chunks [15:37] I'm not sure if there already is one, but we could start listing image hosters [15:38] Do we not have a category on archiveteam.org for it? [15:40] Unrelated question: Anyone here have wiki adminship? The password recovery tool says there's no email for 'tyzoid', but account creation says the username 'tyzoid' is already in use. [15:42] tyzoid, PurpleSym: --delete-after is the option you're looking for. [15:43] huh, nice. [16:05] *** DoragonMo has joined #archiveteam-bs [16:06] *** DoragonMo has quit IRC (Remote host closed the connection) [16:17] https://pastebin.com/raw/Gh2xbSbG [16:18] *** SilSte has quit IRC (Read error: Operation timed out) [16:33] *** HCross has quit IRC (Read error: Connection reset by peer) [16:34] *** HCross has joined #archiveteam-bs [16:51] *** Asparagir has joined #archiveteam-bs [16:52] *** svchfoo1 sets mode: +o Asparagir [16:52] arkiver: Should I stop my download, then? [16:52] now that we've got a warrior project? [16:56] odemg: Once it uploads, I'll be viewable here; https://archive.fart.website/archivebot/viewer/job/36ynv [16:57] tyzoid, I also stuck it here; https://the-eye.eu/public/Random/psvita_devnetleakv1/ [17:09] nobody even told me about acid, but I'm getting [17:09] configure: error: lua not found [17:09] wget-lua not successfully built. [17:11] presuming, liblua5.1-dev will fix... [17:11] trying [17:12] "I'll be shutting things down and upgrading tonight; aiming to have the plugin itself installed in the next couple days." https://community.tulpa.info/thread-gdpr [17:12] Already grabbed the whole forum few days ago. [17:12] More precise link: https://community.tulpa.info/thread-gdpr?pid=203752#pid203752 [17:18] tyzoid, arkiver channel for acid? [17:19] If we don't have #acidburns ? [17:19] Aoede / arkiver: I'm holding #acidrain, but I don't really see a need to use it [17:20] okay, I need pizza then will fire up a few more machines for this [19:02] "Process RsyncUpload returned exit code 5 for Item[...blah...]" [19:02] What's failing here? [19:03] Disregard that... "max connections (120) reached" [19:03] I just don't know on what end. [19:15] *** wp494 has quit IRC (Ping timeout: 260 seconds) [19:16] *** wp494 has joined #archiveteam-bs [19:16] *** svchfoo1 sets mode: +o wp494 [19:16] *** Asparagir has quit IRC (Asparagir) [19:19] *** wp494 has quit IRC (Client Quit) [19:20] Jens: That's almost certainly because the connection limit on the rsync target (i.e. FOS) has been reached. Specifically, "ERROR: max connections (120) reached -- try again later". [19:20] *** wp494 has joined #archiveteam-bs [19:21] Reading the output of pipeline.py isn't always easy because everything from all processes is interleaved and you often don't really know where which line is coming from. [19:21] Plus rsync's --progress messes it up even further because you end up with multiple things on the same line. [19:21] *** svchfoo1 sets mode: +o wp494 [19:26] Are there anymore items after this round or? [19:28] *** chirlu has quit IRC (Ping timeout: 255 seconds) [19:36] FOS has terrible transfer speed to EU. [19:37] I'm getting <1 MB/s from Germany. [19:37] >10 MB/s from communist California. [19:40] What else is new? [19:41] Well, it's new to me. FOS hasn't really been the bottleneck in other projects I've done. [19:46] It's interesting how each project is bottlenecked. My SF VM is now CPU limited. [19:49] *** t2t2 has quit IRC (Remote host closed the connection) [19:49] By wget-lua it seems. [19:50] http://www.goatse.sx/img/acid.png [19:50] And rsync. [19:57] *** t2t2 has joined #archiveteam-bs [20:02] *** schbirid has quit IRC (Quit: Leaving) [20:50] so i have like 5 tapes to digitize from what i got from ebay [20:50] so i did like 20 tapes since tuesday [21:00] Jens, we don't always sync to FOS [21:06] *** DragonMon has quit IRC (se.hub irc.underworld.no) [21:06] *** i0npulse has quit IRC (se.hub irc.underworld.no) [21:06] *** medowar has quit IRC (se.hub irc.underworld.no) [21:06] *** Jens has quit IRC (se.hub irc.underworld.no) [21:06] *** PurpleSym has quit IRC (se.hub irc.underworld.no) [21:06] *** decay__ has quit IRC (se.hub irc.underworld.no) [21:11] arkiver, archive image hosts, I've been doing 1TB/day of imgur for the last 4 months, is that worth putting on ia? [21:11] I've been hashing everything for my own search by image tool [21:13] expanded upon this, https://github.com/4pr0n/irarchives I'll host it again at some point, I let that project die [21:19] *** decay_ has joined #archiveteam-bs [21:28] *** Jens has joined #archiveteam-bs [21:28] *** medowar has joined #archiveteam-bs [21:28] *** PurpleSym has joined #archiveteam-bs [21:28] *** decay__ has joined #archiveteam-bs [21:28] *** Jens has quit IRC (Ping timeout: 252 seconds) [21:28] *** decay__ has quit IRC (Ping timeout: 252 seconds) [21:28] *** medowar has quit IRC (Ping timeout: 252 seconds) [21:28] *** PurpleSym has left [21:29] *** medowar has joined #archiveteam-bs [21:30] *** i0npulse has joined #archiveteam-bs [21:40] *** Jens has joined #archiveteam-bs [22:14] *** Sk1d has quit IRC (Read error: Operation timed out) [22:19] *** Sk1d has joined #archiveteam-bs [22:20] *** bmcginty has quit IRC (Read error: Operation timed out) [22:41] arkiver, no fucking around on this one (acid) any more items after this? [23:28] *** BlueMax has joined #archiveteam-bs [23:36] *** bmcginty has joined #archiveteam-bs [23:36] *** bmcginty has quit IRC (Connection closed) [23:38] SketchCow: did you ever upload Wizard Magazine? [23:39] i only has cause i found it here: http://empire-dcp-minutemen-scanss.blogspot.com/2015/11/wizard-price-guide-magazine-1991-2011.html [23:39] but can't find it on archive.org but not sure if you did upload then when dark [23:39] anyways grabbing for my private collection