#archiveteam-bs 2019-09-10,Tue

↑back Search

Time Nickname Message
00:10 🔗 Sokar has quit IRC (Ping timeout: 258 seconds)
00:11 🔗 BlueMax has joined #archiveteam-bs
00:26 🔗 Sokar has joined #archiveteam-bs
00:39 🔗 wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES)
01:05 🔗 JAA "9th Circuit holds that scraping a public website likely does not violate the CFAA, even after website owner prohibits with a cease-and-desist letter; language strongly suggests CFAA only applies to bypassing authentication."
01:05 🔗 JAA https://twitter.com/OrinKerr/status/1171116153948626944
01:06 🔗 Ryz Yes, all the loot, all of it~
01:25 🔗 Raccoon FREE AARON SWARTZ
01:35 🔗 arkiver JAA: wooooh awesome!
01:36 🔗 arkiver we'll get everything now
01:37 🔗 Raccoon everyone's everything.
02:30 🔗 Zebranky_ is now known as Zebranky
03:31 🔗 qw3rty has joined #archiveteam-bs
03:39 🔗 jognsmith has joined #archiveteam-bs
03:40 🔗 qw3rty2 has quit IRC (Ping timeout: 745 seconds)
03:44 🔗 odemgi_ has joined #archiveteam-bs
03:45 🔗 odemg has quit IRC (Read error: Operation timed out)
03:48 🔗 odemgi has quit IRC (Read error: Operation timed out)
04:00 🔗 odemg has joined #archiveteam-bs
04:39 🔗 Quirk8 has quit IRC (END OF LINE)
04:41 🔗 Quirk8 has joined #archiveteam-bs
04:53 🔗 tuluu has quit IRC (Quit: tuluu)
04:56 🔗 tuluu has joined #archiveteam-bs
05:03 🔗 larryv has quit IRC (Quit: larryv)
06:04 🔗 killsushi has quit IRC (Ping timeout: 255 seconds)
06:13 🔗 killsushi has joined #archiveteam-bs
07:59 🔗 Fusl_ SketchCow: can you pull those items with warcs out of open source and put them into a separate collection + make sure they are indexed into wbm?
07:59 🔗 Fusl_ https://archive.org/search.php?query=archiveteam_sonysketchimg_
08:41 🔗 killsushi has quit IRC (Quit: Leaving)
08:52 🔗 godane SketchCow: just to let you know the new SD Times are not in the SD Times Collection you made years ago : https://archive.org/details/sdtimes
08:53 🔗 godane example : https://archive.org/details/sdtimes287
08:57 🔗 deevious has quit IRC (Quit: deevious)
09:06 🔗 godane has quit IRC (Leaving.)
09:09 🔗 Raccoon has quit IRC (Remote host closed the connection)
10:05 🔗 godane has joined #archiveteam-bs
10:35 🔗 deevious has joined #archiveteam-bs
11:18 🔗 VerifiedJ has joined #archiveteam-bs
11:28 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
12:12 🔗 ave_ has joined #archiveteam-bs
12:17 🔗 DogsRNice has joined #archiveteam-bs
12:26 🔗 Dallas has quit IRC (Quit: The Lounge - https://thelounge.chat)
12:26 🔗 Dallas has joined #archiveteam-bs
12:28 🔗 qw3rty has quit IRC (Ping timeout: 745 seconds)
12:50 🔗 qw3rty has joined #archiveteam-bs
13:09 🔗 Raccoon has joined #archiveteam-bs
14:21 🔗 kiska1 has quit IRC (Remote host closed the connection)
14:21 🔗 Ryz has quit IRC (Remote host closed the connection)
14:22 🔗 Ryz has joined #archiveteam-bs
14:22 🔗 Fusl sets mode: +o Ryz
14:22 🔗 kiska1 has joined #archiveteam-bs
14:22 🔗 Fusl_ sets mode: +o Ryz
14:22 🔗 Fusl__ sets mode: +o Ryz
14:22 🔗 Fusl__ sets mode: +o kiska1
14:22 🔗 svchfoo1 sets mode: +o kiska1
14:22 🔗 Fusl sets mode: +o kiska1
14:22 🔗 Fusl_ sets mode: +o kiska1
14:27 🔗 ave_ has quit IRC (Quit: Connection closed for inactivity)
14:55 🔗 Raccoon` has joined #archiveteam-bs
14:58 🔗 Raccoon has quit IRC (Ping timeout: 360 seconds)
14:59 🔗 Raccoon` is now known as Raccoon
15:44 🔗 larryv has joined #archiveteam-bs
15:46 🔗 SketchCow Fusl_: Why are those not going into archiveteam_inbox
15:48 🔗 SketchCow I've gone ahead and moved them. They'll probably go into WBM but I don't know how they do things anymore.
15:48 🔗 SketchCow !a http://www.fyfz.cn/
15:52 🔗 Raccoon they should appoint you as president god emporer of WBM
15:53 🔗 VADemon has joined #archiveteam-bs
16:01 🔗 SketchCow Oh I do not want that job.
16:03 🔗 Raccoon isn't that how you got into this
16:07 🔗 ivan_ I think SketchCow does computing history archiving, not 'ingest the whole web lol'
16:07 🔗 SketchCow I got pulled in to 'preserve software' and it turns out I did a bunch of other shit
16:08 🔗 pnJay I thought his primary focus was soy sauce :)
16:09 🔗 Raccoon WBM should offer a proper search engine results, but under the guise of being a public record archive to bypass european law
16:10 🔗 Raccoon all search results are at least 30 seconds old, making it right and proper.
16:11 🔗 ivan_ Raccoon is going to donate the petabyte and the expertise to make it happen
16:12 🔗 arkiver Awesome :)
16:12 🔗 arkiver :P
16:12 🔗 RichardG has quit IRC (Ping timeout: 246 seconds)
16:13 🔗 K4k has joined #archiveteam-bs
16:13 🔗 Raccoon even just old school 2003 google or 1998 Altavista would be nice
16:14 🔗 Raccoon as long as I can get search results that aren't pre-filtered for my protection, de-ribbed for my pleasure.
16:14 🔗 arkiver ivan_: he doesn't get it :P
16:16 🔗 Raccoon WBM is supposed to already have all the page content. just how large would the index be to make it searchable?
16:17 🔗 Raccoon and while bsing about this, is there any way to search WBM for all page titles beginning with "Index of" like I used to be able to do with Google up until the last few years
16:18 🔗 Raccoon I miss the glory days of 6 to 10 years ago when I was wget'ing every open directory for the sake of filling harddrives
16:18 🔗 ivan_ 377 billion pages * 10KB of actual text = 3.77PB
16:19 🔗 Raccoon can it be indexed?
16:19 🔗 ivan_ what do you think indexes are made of, Raccoon
16:20 🔗 * Raccoon searches for any ex-HOTBOT employees might still be alive in 2019
16:20 🔗 ivan_ unless you've got exotic compression schemes it's something like a giant KV of (normalized word) -> a list of pointers to every document that has word
16:22 🔗 Raccoon if 90% of words in a book are structural connector words, we can probably shave it down to just indexing 10% of a page's content. Those words that rank with low popularity
16:22 🔗 Raccoon words like 'bukake' or 'palin nudes'
16:23 🔗 Fusl_ SketchCow: they were uploaded prior to the existence of the inbox
16:24 🔗 SketchCow What a lame excuse
16:24 🔗 SketchCow Anyway, all set
16:24 🔗 Raccoon also betting a good chunk of that 3.7PB is html tags, tables, scripts, and now css
16:25 🔗 Raccoon each page could probably be assigned with just 10 to 100 english index words.
16:29 🔗 SketchCow Speaking of BS
16:30 🔗 SketchCow So, years ago I made that cute thing that would look at a archivebot item, take out a nice pleasant set of screenshots of the pages, and then post them as .jpg files just so the things looked good.
16:30 🔗 SketchCow I'd love to do that again - my concern is I could get my mega-hacky thing working again, but it's probably stupid easy now.
16:30 🔗 SketchCow Maybe someone has something lying around - otherwise, I can go find my scripts and get them going.
16:31 🔗 SketchCow Example of what I mean: https://archive.org/details/archiveteam_archivebot_go_20150107190002
16:33 🔗 DogsRNice thats really neat
16:34 🔗 PurpleSym SketchCow: Something like: google-chrome-stable --headless --disable-gpu --screenshot --window-size=1920,1080 <url>
16:34 🔗 * Raccoon thinks he just got Cow shatted #bs :)
16:34 🔗 * phillipsj got the soy sauce reference.
16:34 🔗 DogsRNice https://ia902302.us.archive.org/7/items/archiveteam_archivebot_go_082/www.nc911truth.org-inf-20140728-030309-p4pky-00000.warc.gz.png
16:34 🔗 DogsRNice oh no...
16:35 🔗 SketchCow That's why we save them
16:36 🔗 DogsRNice yeah i get it
16:40 🔗 VADemon > 1MB "preview" screenshot in .png
16:43 🔗 DogsRNice i just found one of chipotles twitter with a swastica on it
16:43 🔗 DogsRNice https://ia802603.us.archive.org/35/items/archiveteam_archivebot_go_20150209010002/twitter.com-inf-20150208-022652-a8aok-00000.warc.gz.png
16:44 🔗 DogsRNice someone really dosnt like burretos
16:46 🔗 phillipsj VADemon, jpg would probably make the text hard to read.
16:47 🔗 systwiALT has joined #archiveteam-bs
16:49 🔗 VADemon I appreciate the lossless quality but its a preview. It's not supposed to be larger than the item. (+unoptimized png)
16:51 🔗 systwiAL_ has quit IRC (Read error: Operation timed out)
16:54 🔗 systwiALT has quit IRC (Read error: Operation timed out)
17:18 🔗 VerifiedJ has quit IRC (Quit: Leaving)
17:32 🔗 RichardG has joined #archiveteam-bs
18:26 🔗 SketchCow PurpleSym: Let me try it
18:28 🔗 SketchCow http://teamarchive1.fnf.archive.org/screenshot.png
18:28 🔗 SketchCow No fuckin' complaints
18:41 🔗 jognsmith has quit IRC (Remote host closed the connection)
18:45 🔗 SketchCow I found my warc screenshotter (It's called WEBBERGRABBER) and will now do the work, and thanks to you it'll do screenshots REALLY fast.
18:45 🔗 SketchCow So that's appreciated.
18:54 🔗 Sanqui I'm excited for more screenshots.
19:14 🔗 Jens has quit IRC (Remote host closed the connection)
19:14 🔗 Jens has joined #archiveteam-bs
19:27 🔗 SketchCow Oops, wiped a script out
19:27 🔗 SketchCow Well, luckily it doesn't do much
19:40 🔗 ndiddy has quit IRC (Quit: WeeChat 1.4)
19:41 🔗 ndiddy has joined #archiveteam-bs
19:58 🔗 SketchCow OK, screenshotter's back in business.
19:58 🔗 SketchCow http://teamarchive1.fnf.archive.org/WEBGRAB/
20:27 🔗 Stiletto has quit IRC (Read error: Operation timed out)
20:30 🔗 Stiletto has joined #archiveteam-bs
20:58 🔗 katocala has quit IRC ()
21:06 🔗 Raccoon has quit IRC (Read error: Connection reset by peer)
21:09 🔗 katocala has joined #archiveteam-bs
21:32 🔗 kiskabak has quit IRC (Remote host closed the connection)
21:32 🔗 kiskabak has joined #archiveteam-bs
21:32 🔗 Fusl sets mode: +o kiskabak
21:32 🔗 Fusl__ sets mode: +o kiskabak
21:32 🔗 Fusl_ sets mode: +o kiskabak
22:06 🔗 killsushi has joined #archiveteam-bs
22:37 🔗 jognsmith has joined #archiveteam-bs
22:44 🔗 jognsmith Hello arkiver :)
22:44 🔗 arkiver you said fotolog?
22:44 🔗 arkiver https://www.archiveteam.org/index.php?title=Fotolog this?
22:44 🔗 jognsmith sorry i meant live spaces, my bad
22:44 🔗 JAA https://www.archiveteam.org/index.php?title=Spaces_of_Windows_Live_Spaces_pending_for_download
22:44 🔗 Smiley has quit IRC (Read error: Operation timed out)
22:44 🔗 JAA (Link in the main chan is broken)
22:44 🔗 arkiver alright
22:44 🔗 arkiver yeah I saw the page
22:44 🔗 JAA The IRC logs don't go back that far.
22:44 🔗 arkiver i was confused since he said fotolog
22:45 🔗 arkiver I'm not sure if it was saved, which one was yours?
22:45 🔗 arkiver (I was not involved in this project)
22:45 🔗 arkiver jognsmith: ^
22:45 🔗 jognsmith photosoffmycats.spaces.live.com
22:46 🔗 jognsmith (i'd like to ask later about fotolog as well)
22:46 🔗 arkiver alright and which one for fotolog?
22:47 🔗 arkiver for fotolog.com
22:48 🔗 jognsmith wolf_alex
22:48 🔗 arkiver ok
22:48 🔗 arkiver If we have http://photosoffmycats.spaces.live.com/, I'm not sure where it is
22:48 🔗 arkiver perhaps chfoo knows something
22:48 🔗 jognsmith oh :c does that mean its lost?
22:49 🔗 arkiver could be
22:49 🔗 arkiver looking into the fotolog one now
22:49 🔗 jognsmith thank you
22:49 🔗 JAA So apparently that list is also part of these grabs: https://www.archiveteam.org/index.php?title=Talk:Windows_Live_Spaces#Phase_2:_Downloading_Hotlists
22:49 🔗 JAA Which are "Uploaded, awaiting verification", so at least they were grabbed at some point.
22:50 🔗 JAA underscor: According to the wiki page, you were running an FTP server for that project at the time. Do you know anything?
22:52 🔗 jognsmith oh! if they were grabbed it could mean good news i guess
22:52 🔗 arkiver are you sure it was wolf_alex?
22:52 🔗 arkiver so fotolog.com/wolf_alex ?
22:53 🔗 arkiver because I can't find it, and it's also not in the list of account we archived from fotolog.
22:53 🔗 jognsmith yes, it was http://www.fotolog.com/wolf_alex
22:54 🔗 jognsmith probably it wouldnt be grabbed though, i didnt have that much followers
22:55 🔗 arkiver I think we discovered users by checking followers, etc.
22:55 🔗 arkiver yeah I don't see it in the lists of account the archived :/
22:55 🔗 arkiver hopefully there will still be good news on spaces
22:55 🔗 jognsmith yeah i guessed so :/ it was a small site
22:56 🔗 jognsmith yes! the fact that the link is in the wiki gives me hope
23:20 🔗 SketchCow Happy to say the archivebot screenshotter works.
23:20 🔗 SketchCow (Just did a full-run.)
23:21 🔗 SketchCow Now I'm running it against archivebot in general.
23:21 🔗 JAA Nice
23:26 🔗 godane so that gaming computer i wanted to get now back up to $700
23:26 🔗 godane was on sale for $580
23:31 🔗 coderobe has quit IRC (Remote host closed the connection)
23:38 🔗 godane SketchCow: what computer pre-build would you get for $600?
23:38 🔗 godane i was looking at this but it went back up to $700 : https://www.amazon.com/Dell-Inspiron-Desktop-Processor-Graphics/dp/B07Q3G3B67/

irclogger-viewer