#archiveteam-bs 2018-09-18,Tue

↑back Search

Time Nickname Message
00:40 🔗 bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…)
00:58 🔗 AlbardinG is now known as flashback
01:03 🔗 flashback is now known as Flashback
01:18 🔗 omglolbah has quit IRC (Read error: Operation timed out)
01:23 🔗 omglolbah has joined #archiveteam-bs
01:40 🔗 adinbied moufu - thanks so much! That worked! Also, no problem schbirid! I'm now running into an issue where wget isn't grabbing the photos and other content on the page, but just the html. I'm sure I'm missing something obvious, but I can't figure out what it is
01:51 🔗 BartoCH has quit IRC (Ping timeout: 615 seconds)
01:51 🔗 BartoCH has joined #archiveteam-bs
01:52 🔗 BlueMaxim has joined #archiveteam-bs
01:52 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
01:53 🔗 BlueMaxim has joined #archiveteam-bs
01:55 🔗 BlueMax has quit IRC (Ping timeout: 633 seconds)
02:01 🔗 albardin has joined #archiveteam-bs
02:10 🔗 w0rmhole godane: nice! how'd you do it? did you use a really fancy scanner? were you able to keep the magazines in-tact (aka w/o cutting the pages out)
02:11 🔗 w0rmhole last question, what's the source format? i want to download it in the best quality possible
02:12 🔗 Flashfire JPEG definetly JPEG
02:12 🔗 Flashfire and .txt files
02:14 🔗 Flashfire W0rmhole I know all of the file types
02:15 🔗 w0rmhole -_-
02:15 🔗 w0rmhole you mean .bmp?
02:15 🔗 w0rmhole 16 colour
02:15 🔗 Flashfire Yes thats what I meant
02:16 🔗 Flashfire and XBM files
02:50 🔗 godane i use JPEG with 90% compression
02:51 🔗 godane i do have a fancy scanner and the books are kept intact
02:51 🔗 godane its a plustek opticbook 4800
02:52 🔗 godane whats funny is that had the scanner giving to me in early 2013 by Jason
02:52 🔗 godane i think only scanned 17 things before cause i had go to windows to use it cause there is no linux driver
02:54 🔗 godane best part is it was best scanner of 2011 on a linux site: linux.sys-con.com/node/2068241
02:55 🔗 godane in case you can view the website: https://web.archive.org/web/20131201213438/linux.sys-con.com/node/2068241
02:55 🔗 godane *can't view the website
02:57 🔗 godane a part of me thought that was always weird cause if there is no linux drivers then how is it the best scanner of the year on a linux site :P
03:20 🔗 godane SketchCow: btw i need those return labels so i buy more tapes on ebay using my patreon money
03:20 🔗 godane sent more the 12 labels also please
03:21 🔗 godane cause i want to sent ALL of the boxes and then some afterwards
03:23 🔗 godane also if possible put return labels in future boxes so we don't have this problem again
06:07 🔗 Mateon1 has quit IRC (Ping timeout: 268 seconds)
06:07 🔗 Mateon1 has joined #archiveteam-bs
06:11 🔗 omglolbah has quit IRC (Ping timeout: 268 seconds)
06:13 🔗 svchfoo1 has quit IRC (Ping timeout: 268 seconds)
06:14 🔗 svchfoo1 has joined #archiveteam-bs
06:15 🔗 svchfoo3 sets mode: +o svchfoo1
06:15 🔗 omglolbah has joined #archiveteam-bs
06:46 🔗 Stilett0 has joined #archiveteam-bs
07:02 🔗 svchfoo1 has quit IRC (Ping timeout: 268 seconds)
07:04 🔗 kiskabak has quit IRC (se.hub efnet.portlane.se)
07:04 🔗 Kaz has quit IRC (se.hub efnet.portlane.se)
07:09 🔗 svchfoo1 has joined #archiveteam-bs
07:09 🔗 kiskabak has joined #archiveteam-bs
07:09 🔗 Kaz has joined #archiveteam-bs
07:09 🔗 svchfoo3 sets mode: +o svchfoo1
07:10 🔗 t2t2 has quit IRC (Read error: Operation timed out)
07:11 🔗 t2t2 has joined #archiveteam-bs
08:48 🔗 t2t2 has quit IRC (Quit: No Ping reply in 210 seconds.)
08:49 🔗 BartoCH has quit IRC (Quit: WeeChat 2.2)
08:50 🔗 BartoCH has joined #archiveteam-bs
08:50 🔗 t2t2 has joined #archiveteam-bs
08:50 🔗 Stilett0 is now known as Stiletto
09:25 🔗 antomatic has joined #archiveteam-bs
09:25 🔗 swebb sets mode: +o antomatic
09:27 🔗 antomati_ has quit IRC (Read error: Operation timed out)
10:35 🔗 wp494 has quit IRC (Ping timeout: 492 seconds)
10:37 🔗 wp494 has joined #archiveteam-bs
10:38 🔗 icedice has joined #archiveteam-bs
11:11 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:47 🔗 icedice has quit IRC (Quit: Leaving)
13:10 🔗 bitBaron has joined #archiveteam-bs
13:51 🔗 Nicu has joined #archiveteam-bs
13:53 🔗 Nicu How about attempt archiving Tripod.Lycos?
13:55 🔗 PurpleSym Nicu: We *do* accept donations: https://opencollective.com/archiveteam
13:56 🔗 kiska Nicu How would we discover hosted services?
13:58 🔗 eientei95 has joined #archiveteam-bs
13:58 🔗 Nicu manually in case it's not yet ready for automatic work?
13:59 🔗 kiska Anyway eientei95 is here from your previous discussion, continue
14:00 🔗 Nicu so i think you could expand
14:00 🔗 Nicu it's a great cause
14:01 🔗 Nicu but there's not much info
14:01 🔗 Nicu about what has been done
14:01 🔗 Nicu or it's not readily available
14:02 🔗 Nicu also, we understand how valuable is the information stored but what could one do with it in the future?
14:04 🔗 kiska Unfortunately I can't find any reference to any sites stored on tripod.lycos through their robots.txt and sitemap
14:04 🔗 kiska Any site crawl we do end up on Wayback Machine
14:06 🔗 Nicu tripod was the homologue of GeoCities right
14:06 🔗 Nicu doesn't it make sense to occupy that too?
14:06 🔗 Nicu that's were i'm coming from
14:07 🔗 JAA kiska: There's http://members.tripod.com/robots.txt . I'm not really sure what the relation between tripod.com and tripod.lycos.com is.
14:08 🔗 JAA Nicu: We usually only go after sites when they are shutting down or removing content or in immediate risk. I agree though that it would be nice to grab Tripod in its entirety.
14:09 🔗 kiska I guess we could chuck a few at a time into #archivebot
14:09 🔗 kiska And we might as well grab angelfire as well since their the same company(I think)
14:10 🔗 JAA Also, I went through the IRC logs. While Tripod has been brought up several times, I don't see anything regarding a systematic archival.
14:10 🔗 JAA Angelfire is in progress over at #angelonfire .
14:10 🔗 Nicu that's good. If I help to copy how do I know that my work will be safe and useful in the future?
14:10 🔗 JAA s/in progress/being worked on/
14:11 🔗 Nicu is it stored in a data center
14:11 🔗 JAA It is stored at the Internet Archive.
14:11 🔗 Nicu that will "work for human kind" in the future?
14:11 🔗 Nicu do you have an agreement or is it an open kind of thing
14:11 🔗 JAA IA has their own DC in San Francisco.
14:11 🔗 JAA There are several people from IA in ArchiveTeam.
14:12 🔗 Nicu ok
14:12 🔗 JAA (Well, I know of two, so not sure if that counts as "several".)
14:13 🔗 JAA We also have an independent project to mirror the most important content from IA in a distributed manner. Check out IA.BAK on our wiki. (I have no idea how active that project is nowadays though.)
14:13 🔗 JAA However, IA stores around 45 PB of unique data currently, so mirroring it all is expensive.
14:14 🔗 kiska I know kiskaJDC has ~900GB of a shard
14:16 🔗 Nicu can i get a pass for wiki?
14:16 🔗 jrwr also Nicu we have been around a good while, If you want to archive something and a project is stalled or not even started, learn how our pipelines work on submit it to be archived as a warrior project
14:16 🔗 Nicu WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
14:16 🔗 jrwr thats the resource we are the the most limited on
14:17 🔗 jrwr programmers and engineers
14:17 🔗 JAA Nicu: What do you want to do on the wiki?
14:17 🔗 Nicu expand the mindset for archiving the Internet
14:17 🔗 jrwr How so?
14:18 🔗 jrwr Lets discuss it
14:18 🔗 Nicu i'm nostalgic for how creative it used to be
14:18 🔗 Nicu homepages that expressed individuality
14:18 🔗 Nicu web design that was unexplainable beautiful
14:18 🔗 Nicu perhaps there might be a way to preserve this
14:18 🔗 jrwr How do you see us expanding the mindset for archiving the internet
14:19 🔗 Nicu also the names for irc channels that were beautiful (esp. on Undernet)
14:19 🔗 Nicu perhaps archive the IRC logs
14:19 🔗 JAA Our mindset is already "archive ALL the things". I don't think there's much to expand there. :-)
14:19 🔗 jrwr lol
14:19 🔗 eientei95 lol
14:19 🔗 Nicu but you don't do anything in this direction
14:19 🔗 Nicu how to categorize
14:19 🔗 Nicu web design under ALL?
14:19 🔗 jrwr JAA: how much per month do we shove into archivebot?
14:19 🔗 jrwr 1-2TB?
14:20 🔗 eientei95 IRC operators tend to not like channels being idled in for logs
14:20 🔗 Igloo Way more jrwr
14:20 🔗 Igloo More like 20-30+ some months
14:20 🔗 jrwr Ya
14:20 🔗 JAA jrwr: We uploaded 23.7 TiB in August.
14:20 🔗 JAA 485 TiB in total since ArchiveBot was started.
14:20 🔗 jrwr Nicu: even now: Job status: 95499 completed, 2931 aborted, 567 failed, 78 in progress, 23 pending
14:20 🔗 Nicu how to access it?
14:21 🔗 jrwr all gets uploaded to the wayback machine
14:21 🔗 jrwr and the internet archive as WARCs
14:21 🔗 jrwr so normal people can access it with the wayback machine
14:21 🔗 Nicu sounds like trash and not info that could be instructable
14:21 🔗 JAA https://archive.org/details/archivebot if you want the raw data. https://web.archive.org/ if you want to browse it.
14:22 🔗 Igloo Aug '18 3.09 TiB / 2.57 TiB / 5.66 TiB
14:22 🔗 jrwr Well, thats the thing, how do you present the billions of webpages we have saved
14:22 🔗 kiska eg https://web.archive.org/web/20180906155341/https://oldforums.eveonline.com/ Archivebot did this about 2 weeks ago. And you can see it by clicking on "About this capture"
14:22 🔗 jrwr A eve nerd! :)
14:22 🔗 JAA Nicu: Feel free to download the CDXes from the ArchiveBot collection and build a nice interface for it. Hint, it'll be huge.
14:22 🔗 jrwr Ya, the wayback does allow for searching
14:23 🔗 JAA Yeah, but only page titles unfortunately.
14:23 🔗 jrwr but browsing is a little harder, like the old webrings (that still work!)
14:23 🔗 kiska jrwr xD And I think that upload was due to CCP being bought out by Pearl Abyss
14:23 🔗 jrwr lol
14:23 🔗 jrwr there are dozens of us, DOZENS
14:23 🔗 kiska So I chucked it into the bot, it landed on my SSD pipeline, therefore it ran out of space
14:24 🔗 jrwr Nicu: thats the hardest thing, there is a metric ton of data, stored in open formats waiting for someone to do something with it
14:24 🔗 eientei95 I'm still on the tutorial
14:24 🔗 jrwr We are here to make sure that data event exists
14:24 🔗 eientei95 Archive now, make pretty later
14:24 🔗 jrwr eientei95: Play with others, join Pandemic Horde or any of the other newbie alliances
14:24 🔗 jrwr you will love eve then
14:25 🔗 eientei95 If we make pretty now, archiving will be held back
14:25 🔗 jrwr Correct
14:25 🔗 kiska I played for just about 8 years I think
14:25 🔗 eientei95 jrwr: Will do, going to get back to actually playing games
14:25 🔗 JAA Take the Eve discussion to -ot please.
14:25 🔗 jrwr and we are open to any project/website to be archived, we have archivebot for the smaller jobs
14:25 🔗 kiska But yes make pretty now will make archival very slow
14:25 🔗 jrwr and anyone can make a pipeline / wget-lua code to archive a site when required
14:26 🔗 kiska Especially sites that use a ton of Javascript
14:26 🔗 jrwr Nicu: any thoughts so far?
14:26 🔗 eientei95 i,e. Modern websites
14:26 🔗 Nicu i'm thinking of my 2TB of free space
14:26 🔗 Nicu and that they can't be useful
14:26 🔗 Nicu since it's stored in IA?
14:26 🔗 eientei95 "can't be useful"
14:26 🔗 eientei95 [02:25:56] <jrwr> and anyone can make a pipeline / wget-lua code to archive a site when required
14:26 🔗 Nicu or maybe use tha IA.BAK thing?
14:27 🔗 jrwr Run a warrior, run a dozen of them, docker is great for this
14:27 🔗 JAA eientei95: Warrior instances don't use much disk space and normally don't keep all the data anyway.
14:27 🔗 jrwr your part helps with archiving sites when the need arises
14:27 🔗 Nicu i feel like i am losing the motivation since it's like grabbing what you can
14:27 🔗 jrwr Can you program Nicu
14:28 🔗 JAA Yep, that. And you can join IA.BAK also if you want.
14:28 🔗 eientei95 JAA: Right, I've had a few instances where Warrior failed due to lack of disk space
14:28 🔗 jrwr IA.BAK is pretty good
14:28 🔗 jrwr always need more diskspace for that
14:28 🔗 Nicu the urge is now, for all the good thing social networks and telegram destroy like hurricanes
14:28 🔗 JAA eientei95: Oh, that can certainly happen. But it doesn't need 2 TB of disk space.
14:28 🔗 Nicu and ArchiveTeam "just shoves" :-D
14:29 🔗 JAA What would you suggest we do? Ignore the sites that are shutting down all the time, and let their data be lost forever?
14:29 🔗 Nicu i am not suggesting, just saying this is too generalistic
14:29 🔗 Nicu am i wrong?
14:30 🔗 jrwr Its all we can do, there is so much to save
14:30 🔗 eientei95 WE are all but a drop in the ocean
14:30 🔗 jrwr ^^
14:30 🔗 Nicu doesn't it matter WHAT we save
14:30 🔗 Igloo We save what we we can. As much as we can
14:30 🔗 Nicu i don't want to interrupt the good work though
14:30 🔗 Nicu just putting it to debate
14:30 🔗 eientei95 Start up a pipeline, shove sites you want archived into it
14:31 🔗 jrwr Also, Anyone can upload to the internet archive
14:31 🔗 Igloo Not anymore IIRC
14:31 🔗 eientei95 ^
14:31 🔗 Igloo I think only some stuff goes into the way back
14:31 🔗 jrwr ya
14:31 🔗 JAA Anyone can upload to IA. But it only goes into the WBM if you're whitelisted.
14:31 🔗 JAA That doesn't mean that uploads that don't go into the WBM aren't useful.
14:31 🔗 jrwr but if you make good WARCs (built into wget ) save anything and everything you can, add good metadata for people to find it
14:32 🔗 Nicu breaking my dreams of seeing Internet not just a dump like now :-D
14:32 🔗 jrwr the wayback machine is the best method to relive that old internet
14:32 🔗 jrwr we are doing everything we can to give it as much data as we can
14:34 🔗 jrwr If you want to archive websites we are not covering, go do it then, all the knowledge we have is in the public, our wiki, our github
14:35 🔗 Nicu i know the logic tolds you it's correct to run this machine of endless copying don't know what but the heart and soul tells you I'm right but you don't want to take this into consideration
14:35 🔗 dxrt- has joined #archiveteam-bs
14:35 🔗 dxrt has quit IRC (Ping timeout: 252 seconds)
14:35 🔗 jrwr Ok, you are not making any sense Nicu
14:36 🔗 jrwr Right now, What do you want us to do
14:36 🔗 jrwr Dumb is down for me
14:37 🔗 Nicu go to the darkside
14:37 🔗 hook54321 has quit IRC (Ping timeout: 252 seconds)
14:37 🔗 i0npulse has quit IRC (Ping timeout: 252 seconds)
14:38 🔗 jrwr Thats not even remotely helpful,
14:38 🔗 jrwr I'm ignoring you now
14:38 🔗 adinbied Completely off-topic from the discussion going on, but does anyone know why my WGetDownload isn't grabbing images?
14:39 🔗 JAA Content gets lost all the time because websites shut down. We try to save what we can. And you're saying this isn't useful?
14:39 🔗 JAA Also, it's not ours to decide which information should be preserved. We don't know what will be useful for future historians. So "archive ALL the things".
14:40 🔗 adinbied https://github.com/adinbied/angelfire-grab/blob/master/pipeline.py#L162 is my WGetArgs & for some reason it's only grabbing the HTML & not all of the resources (gifs, pngs, jpgs, embedded content)
14:40 🔗 adinbied https://www.archiveteam.org/images/c/ce/Archive-all-the-things-thumb.jpg
14:41 🔗 Nicu has quit IRC ()
14:41 🔗 JAA adinbied: The problem is your download_child_p hook. It always returns false, meaning everything should be skipped.
14:42 🔗 JAA I think the initial URLs passed on the command line are exempt from this filtering. And so that's all it grabs.
14:42 🔗 adinbied Ah, derp. I really need to learn more Lua - that makes sense. Thanks!
14:48 🔗 adinbied Hmmm.. Getting rid of that still doesn't seem to work....
14:49 🔗 pikhq has quit IRC (se.hub irc.underworld.no)
14:49 🔗 kiska has quit IRC (se.hub irc.underworld.no)
14:49 🔗 Flashfire has quit IRC (se.hub irc.underworld.no)
14:49 🔗 w0rmhole has quit IRC (se.hub irc.underworld.no)
14:50 🔗 pikhq_ has joined #archiveteam-bs
14:52 🔗 kiska has joined #archiveteam-bs
14:52 🔗 hook54321 has joined #archiveteam-bs
14:52 🔗 w0rmhole has joined #archiveteam-bs
14:54 🔗 i0npulse has joined #archiveteam-bs
14:54 🔗 Flashfire has joined #archiveteam-bs
15:08 🔗 jrwr JAA: so I had an interesting idea
15:08 🔗 jrwr IPFS content mirroring to the IA, cache all the content we can (its easy to discover random content) and upload it to the IA or have a IA box pin content on the IPFS to save it
15:10 🔗 JAA Yeah, the latter would probably be the easiest, just get IA to pin the content and it'll live forever.
15:10 🔗 jrwr https://github.com/ipfs-search/ipfs-search
15:10 🔗 jrwr looks like you can sniff the DHT traffic to find content
15:11 🔗 jrwr since content that doesn't get pinned will get purged after some time
15:11 🔗 jrwr or not access
15:22 🔗 JAA Is there any estimate how big IPFS is?
15:27 🔗 jrwr ~Lots~
15:27 🔗 jrwr https://github.com/victorbjelkholm/ipfscrape
15:27 🔗 jrwr JAA: interesting project
15:27 🔗 jrwr saves webpages and stores them into ipfs
15:32 🔗 JAA Interesting indeed.
15:38 🔗 jrwr like right now
15:39 🔗 jrwr im shoving my entire dataset (about 20k files) into IPFS
15:39 🔗 jrwr so my users can use it, since they like having the entire dataset at times
15:47 🔗 Jon has quit IRC (Read error: Operation timed out)
15:50 🔗 jmtd has joined #archiveteam-bs
16:22 🔗 icedice has joined #archiveteam-bs
16:25 🔗 jrwr wonder who we poke to do something like that :)
16:25 🔗 adinbied arkiver, I'm sure you are insanely busy (I know life gets in the way) - whenever you can, would you be able to get the Angelfire project setup (ie getting the tracker setup and the github repo initialized) and looking over the quizlet target and giving the OK to proceed?
16:45 🔗 astrid that joker "hello_" /msg'd me and demanded pictures
16:45 🔗 astrid buddy, i grew up online and my girlfriend does porn. this isn't my first rodeo.
16:52 🔗 adinbied OK, so it seems my issue is in the Lua - how do I specify the if string match *.jpg,*.png,*.gif then add to url queue
16:54 🔗 adinbied In the wget callbacks get urls function I need to specify in broad general terms that if an image is found in the HTML of the page, then to add it to grab
16:56 🔗 Dimtree has joined #archiveteam-bs
17:28 🔗 Dimtree has quit IRC (Peace)
17:44 🔗 Dimtree has joined #archiveteam-bs
17:52 🔗 Dimtree has quit IRC (Quit: Peace)
17:54 🔗 icedice has quit IRC (Quit: Leaving)
17:57 🔗 Dimtree has joined #archiveteam-bs
18:15 🔗 jmtd has quit IRC (Ping timeout: 252 seconds)
18:15 🔗 i0npulse has quit IRC (Ping timeout: 252 seconds)
18:15 🔗 w0rmhole has quit IRC (Ping timeout: 252 seconds)
18:16 🔗 Jon- has joined #archiveteam-bs
18:16 🔗 Flashfire has quit IRC (Ping timeout: 252 seconds)
18:16 🔗 hook54321 has quit IRC (Ping timeout: 252 seconds)
18:18 🔗 kiska has quit IRC (Ping timeout: 252 seconds)
18:43 🔗 i0npulse has joined #archiveteam-bs
18:46 🔗 hook54321 has joined #archiveteam-bs
20:39 🔗 Lord_Nigh has quit IRC (Quit: ZNC - http://znc.in)
20:41 🔗 Lord_Nigh has joined #archiveteam-bs
21:09 🔗 ColdIce has joined #archiveteam-bs
21:13 🔗 bitBaron has quit IRC (Read error: Connection reset by peer)
21:14 🔗 bitBaron has joined #archiveteam-bs
21:26 🔗 bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…)
21:27 🔗 bitBaron has joined #archiveteam-bs
21:35 🔗 Flashfire has joined #archiveteam-bs
23:50 🔗 bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…)

irclogger-viewer