#archiveteam-bs 2018-05-27,Sun

↑back Search

Time Nickname Message
00:00 πŸ”— fie has quit IRC (Ping timeout: 268 seconds)
00:18 πŸ”— Asparagir has quit IRC (Asparagir)
00:27 πŸ”— BlueMax has joined #archiveteam-bs
00:28 πŸ”— Mateon1 has quit IRC (Remote host closed the connection)
00:38 πŸ”— Mateon1 has joined #archiveteam-bs
00:40 πŸ”— Jusque has joined #archiveteam-bs
00:44 πŸ”— Jusque_ has joined #archiveteam-bs
00:47 πŸ”— fie has joined #archiveteam-bs
00:54 πŸ”— Jusque has quit IRC (Read error: Operation timed out)
00:54 πŸ”— Jusque_ is now known as Jusque
01:11 πŸ”— kisspunch has quit IRC (Ping timeout: 260 seconds)
01:14 πŸ”— kisspunch has joined #archiveteam-bs
02:06 πŸ”— ta9le has quit IRC (Quit: Connection closed for inactivity)
02:16 πŸ”— BlueMax has quit IRC (Leaving)
02:54 πŸ”— BlueMax has joined #archiveteam-bs
02:57 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
03:00 πŸ”— Sk1d has joined #archiveteam-bs
03:19 πŸ”— qw3rty115 has joined #archiveteam-bs
03:22 πŸ”— odemg has quit IRC (Ping timeout: 260 seconds)
03:23 πŸ”— qw3rty114 has quit IRC (Read error: Operation timed out)
03:34 πŸ”— odemg has joined #archiveteam-bs
04:36 πŸ”— Asparagir has joined #archiveteam-bs
04:37 πŸ”— svchfoo1 sets mode: +o Asparagir
04:39 πŸ”— bsmith093 so, my fanfic dumps got dmca'd.
04:39 πŸ”— bsmith093 just ffnet not ao3
04:51 πŸ”— tapedrive bsmith093: which item? I can still access https://archive.org/download/fanfictiondotnet_repack and https://archive.org/download/Fanfictiondotnet1011dump fine.
04:52 πŸ”— bsmith093 those and the smaller 100k-links-at-a-time ones.
04:52 πŸ”— bsmith093 should i just keeps going?
04:52 πŸ”— bsmith093 *keep
04:52 πŸ”— tapedrive Do you have them somewhere else?
04:53 πŸ”— bsmith093 not really, once i found out i grabbed them asap, where's a good place?
05:19 πŸ”— godane SkethcCow: alot of items recently upload got dark : https://archive.org/details/@seanpaulk
05:19 πŸ”— godane there
05:20 πŸ”— godane all i got from one of the items is this for why its dark: Darked using /home/jake/scripts/lock_delete_darke_user.py
05:21 πŸ”— godane https://catalogd.archive.org/log/900209219
05:21 πŸ”— godane maybe the malware script darked them but not sure
05:23 πŸ”— godane for all the items that got dark: http://archive.org/metamgr.php?&w_uploader=seanpaulk@hotmail.com&mode=more
05:24 πŸ”— godane even the guys fav-seanpaulk collection got dark
05:27 πŸ”— hook54321 godane: Is there a way for me to get a list of what items of mine are darked?
05:28 πŸ”— godane using your email address for you account yes
05:29 πŸ”— hook54321 I tried putting my address in there and still got "not authorized to access this service"
05:30 πŸ”— godane what is your user name?
05:30 πŸ”— hook54321 hook54321a, dm me what it says is darked though.
05:35 πŸ”— godane my account size: size: 131,113,480,968 KB
05:36 πŸ”— godane 131.1TB
05:37 πŸ”— hook54321 What exactly is the Meta-manager used for?
05:38 πŸ”— godane i think maybe you can see it cause your not admin
05:38 πŸ”— godane anyways i can see your items
05:38 πŸ”— godane http://archive.org/metamgr.php?&w_uploader=hook54321a@gmail.com&mode=more
05:39 πŸ”— godane anyways you got 10 items that dark
05:40 πŸ”— godane 2 are in test_collectrion
05:40 πŸ”— godane *test_collection
05:51 πŸ”— godane my full item count include darked ones is : 1,416,872
05:55 πŸ”— DragonMon has joined #archiveteam-bs
06:07 πŸ”— Mateon1 has quit IRC (Read error: Operation timed out)
06:07 πŸ”— Mateon1 has joined #archiveteam-bs
06:10 πŸ”— Asparagir has quit IRC (Asparagir)
07:41 πŸ”— omglolbah has quit IRC (Ping timeout: 268 seconds)
09:46 πŸ”— ta9le has joined #archiveteam-bs
09:49 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
09:50 πŸ”— Sk1d has joined #archiveteam-bs
10:09 πŸ”— elise has joined #archiveteam-bs
10:12 πŸ”— elise has quit IRC (Quit: leaving)
10:13 πŸ”— elise has joined #archiveteam-bs
10:13 πŸ”— elise 08:59 < elise> Hello archivers !
10:13 πŸ”— elise 08:59 < elise> I have a question regarding youtube content
10:13 πŸ”— elise 09:00 < elise> more specifically metadata, not the videos content per se
10:13 πŸ”— elise 09:00 < elise> I saw that there is an ongoing effort regarding the videos, but is there something similar for metadata ?
10:13 πŸ”— elise 09:01 < elise> What are the tools used for the organization of metadata ?
10:16 πŸ”— ivan would you like to start a project to archive all of youtube's metadata? I can provide design assistance
10:16 πŸ”— ivan it's just a few billion public videos so it should fit on a server
10:20 πŸ”— ivan well, maybe more than one if you're grabbing video descriptions and comments and such
10:22 πŸ”— ivan keeping track of all this metadata would help a lot with figuring out what to archive and keeping track of unlisted videos
10:30 πŸ”— omglolbah has joined #archiveteam-bs
10:33 πŸ”— elise Sorry, I have not been clear : I am not interested in the video, only the metadata : channels, playlist, comments, descriptions, timestamps
10:34 πŸ”— elise If nothing exists, I would like to do something on my own
10:34 πŸ”— elise If something exists, I am curious about the implementation
10:34 πŸ”— ivan I haven't heard of anyone archiving youtube metadata
10:35 πŸ”— elise ok
10:36 πŸ”— elise my raw idea would be to actually organize everything and be able to request ressource the same way you request YT API
10:36 πŸ”— ivan is the youtube API any good? I would want to just run SQL queries against the database
10:37 πŸ”— elise I am thinking about mirroring kind of so that the same code can be used against the archive
10:38 πŸ”— hook54321 elise: Like, metadata of the videos?
10:38 πŸ”— elise yes
10:38 πŸ”— hook54321 sets mode: +o ivan
10:38 πŸ”— elise well, it s broader, it's any ressources that are not videos
10:39 πŸ”— hook54321 tubeup (a comoon set of scripts that people oftentimes use to upload videos to archive.org) uploads at least some metadata relating to videos.
10:44 πŸ”— elise I already have some bits of code to request ressources from the API, I am currently thinking in how to organize all of this to query it easily
10:45 πŸ”— elise also, wanted your input as archivers on my idea of mirroring
10:45 πŸ”— elise I don't know if you would use this kind of solution to archive database that are accessible by a public API
11:02 πŸ”— DragonMon has quit IRC (Quit: Leaving)
11:05 πŸ”— ivan what kind of rate limits do you get on their public API?
11:37 πŸ”— PurpleSym arkiver: I’ll stop my grab then?
11:37 πŸ”— arkiver PurpleSym: maybe wait until this project is succesful
11:37 πŸ”— PurpleSym Ok.
11:37 πŸ”— arkiver in case something goes wrong
11:37 πŸ”— PurpleSym My last estimate was ~5 TB, btw.
11:37 πŸ”— arkiver 5 TB is fine too
11:39 πŸ”— arkiver all numbers before the URLs give me 722417136690
11:39 πŸ”— arkiver which is ~722GB
11:39 πŸ”— arkiver assuming the number in front are sizes?
11:40 πŸ”— PurpleSym No, the first number in audiourls.txt is the original song ID.
11:40 πŸ”— arkiver ah oops
11:41 πŸ”— arkiver but 5 TB should be fine too, yeah
11:41 πŸ”— PurpleSym I’m not sure whether filenames are unique, so I’m saving to {id}.{ext}
11:41 πŸ”— arkiver I see
11:44 πŸ”— PurpleSym How many files did you combine into one item, arkiver ?
11:45 πŸ”— arkiver 5 URLs/item
11:45 πŸ”— arkiver all items are in the tracker now
11:49 πŸ”— PurpleSym Any chance I can get access to the rsync target once the grab finished? Moving 5 TB from IA to Europe and back is quite painful.
11:49 πŸ”— arkiver it's already going to IA now
11:49 πŸ”— arkiver to FOS
11:50 πŸ”— PurpleSym Yeah, would be easier if I extracted the WARCs directly on FOS to upload individual items. (If that’s still something we want.)
11:51 πŸ”— arkiver I could let a script run on IA servers to process the WARCs when they are in IA
11:51 πŸ”— arkiver as in when they are in items on IA
11:51 πŸ”— PurpleSym Sure, whatever works.
11:52 πŸ”— arkiver I think we can sort something out
11:52 πŸ”— PurpleSym Are you an IA employee nowadays, arkiver ?
11:54 πŸ”— arkiver yep
12:05 πŸ”— BlueMax has quit IRC (Read error: Connection reset by peer)
12:07 πŸ”— BlueMax has joined #archiveteam-bs
12:08 πŸ”— DragonMon has joined #archiveteam-bs
12:39 πŸ”— elise has quit IRC (hub.efnet.us irc.efnet.nl)
12:39 πŸ”— godane has quit IRC (hub.efnet.us irc.efnet.nl)
12:43 πŸ”— elise_ has joined #archiveteam-bs
13:01 πŸ”— godane has joined #archiveteam-bs
13:01 πŸ”— svchfoo1 sets mode: +o godane
13:12 πŸ”— schbirid has joined #archiveteam-bs
14:00 πŸ”— elise_ \ivan 1 million read per day
14:05 πŸ”— SilSte has joined #archiveteam-bs
14:16 πŸ”— arkiver What do you think of archiving websites where images can be stored?
14:16 πŸ”— arkiver For example website used to images on forums, etc.
14:16 πŸ”— arkiver Also for websites that are not currently in dange
14:16 πŸ”— arkiver danger*
14:19 πŸ”— tapedrive It would be great if it happened.
14:20 πŸ”— tapedrive I know most of the free services used delete images after a while.
14:50 πŸ”— BlueMax has quit IRC (Leaving)
15:09 πŸ”— tyzoid PurpleSym: I'm uploading my downloads to https://archive.org/download/tyzoid-acidplanet-audio
15:10 πŸ”— PurpleSym Ok, tyzoid. arkiver: ↑
15:20 πŸ”— tyzoid arkiver / tapedrive: There was that one image hosting service that was really popular ~2002, which expires images after a while
15:20 πŸ”— tyzoid can't remember the name of the service, but I think it's a good idea - especially on internet message boards
15:21 πŸ”— tyzoid blogs / etc, I'm not as concerned with the images, since they tend to be hosted first-party.
15:21 πŸ”— tyzoid on a forum, it's almost always 3rd party
15:23 πŸ”— eientei95 Photobucket?
15:24 πŸ”— tyzoid I think so
15:31 πŸ”— tyzoid PurpleSym: Do you know if there's a way to get wget to remove the leftover files once they've been added to the warc?
15:32 πŸ”— tyzoid or is it a matter of just pausing the download, cleaning it up, and restarting it?
15:32 πŸ”— PurpleSym Not sure, -O /dev/null perhaps?
15:34 πŸ”— tyzoid Maybe, we'll see. Btw, I'm uploading in 1gb warc chunks
15:37 πŸ”— arkiver I'm not sure if there already is one, but we could start listing image hosters
15:38 πŸ”— tyzoid Do we not have a category on archiveteam.org for it?
15:40 πŸ”— tyzoid Unrelated question: Anyone here have wiki adminship? The password recovery tool says there's no email for 'tyzoid', but account creation says the username 'tyzoid' is already in use.
15:42 πŸ”— JAA tyzoid, PurpleSym: --delete-after is the option you're looking for.
15:43 πŸ”— tyzoid huh, nice.
16:05 πŸ”— DoragonMo has joined #archiveteam-bs
16:06 πŸ”— DoragonMo has quit IRC (Remote host closed the connection)
16:17 πŸ”— odemg https://pastebin.com/raw/Gh2xbSbG
16:18 πŸ”— SilSte has quit IRC (Read error: Operation timed out)
16:33 πŸ”— HCross has quit IRC (Read error: Connection reset by peer)
16:34 πŸ”— HCross has joined #archiveteam-bs
16:51 πŸ”— Asparagir has joined #archiveteam-bs
16:52 πŸ”— svchfoo1 sets mode: +o Asparagir
16:52 πŸ”— tyzoid arkiver: Should I stop my download, then?
16:52 πŸ”— tyzoid now that we've got a warrior project?
16:56 πŸ”— tyzoid odemg: Once it uploads, I'll be viewable here; https://archive.fart.website/archivebot/viewer/job/36ynv
16:57 πŸ”— odemg tyzoid, I also stuck it here; https://the-eye.eu/public/Random/psvita_devnetleakv1/
17:09 πŸ”— odemg nobody even told me about acid, but I'm getting
17:09 πŸ”— odemg configure: error: lua not found
17:09 πŸ”— odemg wget-lua not successfully built.
17:11 πŸ”— odemg presuming, liblua5.1-dev will fix...
17:11 πŸ”— odemg trying
17:12 πŸ”— lindalap "I'll be shutting things down and upgrading tonight; aiming to have the plugin itself installed in the next couple days." https://community.tulpa.info/thread-gdpr
17:12 πŸ”— lindalap Already grabbed the whole forum few days ago.
17:12 πŸ”— lindalap More precise link: https://community.tulpa.info/thread-gdpr?pid=203752#pid203752
17:18 πŸ”— odemg tyzoid, arkiver channel for acid?
17:19 πŸ”— odemg If we don't have #acidburns ?
17:19 πŸ”— Aoede <tyzoid> Aoede / arkiver: I'm holding #acidrain, but I don't really see a need to use it
17:20 πŸ”— odemg okay, I need pizza then will fire up a few more machines for this
19:02 πŸ”— Jens "Process RsyncUpload returned exit code 5 for Item[...blah...]"
19:02 πŸ”— Jens What's failing here?
19:03 πŸ”— Jens Disregard that... "max connections (120) reached"
19:03 πŸ”— Jens I just don't know on what end.
19:15 πŸ”— wp494 has quit IRC (Ping timeout: 260 seconds)
19:16 πŸ”— wp494 has joined #archiveteam-bs
19:16 πŸ”— svchfoo1 sets mode: +o wp494
19:16 πŸ”— Asparagir has quit IRC (Asparagir)
19:19 πŸ”— wp494 has quit IRC (Client Quit)
19:20 πŸ”— JAA Jens: That's almost certainly because the connection limit on the rsync target (i.e. FOS) has been reached. Specifically, "ERROR: max connections (120) reached -- try again later".
19:20 πŸ”— wp494 has joined #archiveteam-bs
19:21 πŸ”— JAA Reading the output of pipeline.py isn't always easy because everything from all processes is interleaved and you often don't really know where which line is coming from.
19:21 πŸ”— JAA Plus rsync's --progress messes it up even further because you end up with multiple things on the same line.
19:21 πŸ”— svchfoo1 sets mode: +o wp494
19:26 πŸ”— odemg Are there anymore items after this round or?
19:28 πŸ”— chirlu has quit IRC (Ping timeout: 255 seconds)
19:36 πŸ”— Jens FOS has terrible transfer speed to EU.
19:37 πŸ”— Jens I'm getting <1 MB/s from Germany.
19:37 πŸ”— Jens >10 MB/s from communist California.
19:40 πŸ”— JAA What else is new?
19:41 πŸ”— Jens Well, it's new to me. FOS hasn't really been the bottleneck in other projects I've done.
19:46 πŸ”— Jens It's interesting how each project is bottlenecked. My SF VM is now CPU limited.
19:49 πŸ”— t2t2 has quit IRC (Remote host closed the connection)
19:49 πŸ”— Jens By wget-lua it seems.
19:50 πŸ”— Jens http://www.goatse.sx/img/acid.png
19:50 πŸ”— Jens And rsync.
19:57 πŸ”— t2t2 has joined #archiveteam-bs
20:02 πŸ”— schbirid has quit IRC (Quit: Leaving)
20:50 πŸ”— godane so i have like 5 tapes to digitize from what i got from ebay
20:50 πŸ”— godane so i did like 20 tapes since tuesday
21:00 πŸ”— odemg Jens, we don't always sync to FOS
21:06 πŸ”— DragonMon has quit IRC (se.hub irc.underworld.no)
21:06 πŸ”— i0npulse has quit IRC (se.hub irc.underworld.no)
21:06 πŸ”— medowar has quit IRC (se.hub irc.underworld.no)
21:06 πŸ”— Jens has quit IRC (se.hub irc.underworld.no)
21:06 πŸ”— PurpleSym has quit IRC (se.hub irc.underworld.no)
21:06 πŸ”— decay__ has quit IRC (se.hub irc.underworld.no)
21:11 πŸ”— odemg arkiver, archive image hosts, I've been doing 1TB/day of imgur for the last 4 months, is that worth putting on ia?
21:11 πŸ”— odemg I've been hashing everything for my own search by image tool
21:13 πŸ”— odemg expanded upon this, https://github.com/4pr0n/irarchives I'll host it again at some point, I let that project die
21:19 πŸ”— decay_ has joined #archiveteam-bs
21:28 πŸ”— Jens has joined #archiveteam-bs
21:28 πŸ”— medowar has joined #archiveteam-bs
21:28 πŸ”— PurpleSym has joined #archiveteam-bs
21:28 πŸ”— decay__ has joined #archiveteam-bs
21:28 πŸ”— Jens has quit IRC (Ping timeout: 252 seconds)
21:28 πŸ”— decay__ has quit IRC (Ping timeout: 252 seconds)
21:28 πŸ”— medowar has quit IRC (Ping timeout: 252 seconds)
21:28 πŸ”— PurpleSym has left
21:29 πŸ”— medowar has joined #archiveteam-bs
21:30 πŸ”— i0npulse has joined #archiveteam-bs
21:40 πŸ”— Jens has joined #archiveteam-bs
22:14 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
22:19 πŸ”— Sk1d has joined #archiveteam-bs
22:20 πŸ”— bmcginty has quit IRC (Read error: Operation timed out)
22:41 πŸ”— odemg arkiver, no fucking around on this one (acid) any more items after this?
23:28 πŸ”— BlueMax has joined #archiveteam-bs
23:36 πŸ”— bmcginty has joined #archiveteam-bs
23:36 πŸ”— bmcginty has quit IRC (Connection closed)
23:38 πŸ”— godane SketchCow: did you ever upload Wizard Magazine?
23:39 πŸ”— godane i only has cause i found it here: http://empire-dcp-minutemen-scanss.blogspot.com/2015/11/wizard-price-guide-magazine-1991-2011.html
23:39 πŸ”— godane but can't find it on archive.org but not sure if you did upload then when dark
23:39 πŸ”— godane anyways grabbing for my private collection

irclogger-viewer