#archiveteam-bs 2018-09-30,Sun

↑back Search

Time Nickname Message
00:00 πŸ”— closure has joined #archiveteam-bs
00:01 πŸ”— hook54321 has joined #archiveteam-bs
00:06 πŸ”— Frogging has joined #archiveteam-bs
00:17 πŸ”— Jens has quit IRC (Read error: Operation timed out)
00:17 πŸ”— phuzion has quit IRC (Read error: Operation timed out)
00:22 πŸ”— i0npulse has joined #archiveteam-bs
00:22 πŸ”— BlueMax has quit IRC (Read error: Operation timed out)
00:24 πŸ”— phuzion has joined #archiveteam-bs
00:24 πŸ”— Jens has joined #archiveteam-bs
00:25 πŸ”— BlueMax has joined #archiveteam-bs
00:33 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
00:35 πŸ”— closure has joined #archiveteam-bs
00:48 πŸ”— ndiddy has quit IRC ()
00:58 πŸ”— closure has quit IRC (Read error: Operation timed out)
00:59 πŸ”— Swicher has joined #archiveteam-bs
01:01 πŸ”— closure has joined #archiveteam-bs
01:02 πŸ”— JAA Swicher: So the way we typically download sites in a distributed way splits it up into individual work items. These items normally only take a relatively short amount of time, e.g. a few minutes.
01:03 πŸ”— JAA So it's not a big issue if an individual machine needs to be shut down or has issues and crashes.
01:26 πŸ”— godane so i got 1720 pdfs for ERIC archive so far with this audit
01:33 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
01:33 πŸ”— closure_ has joined #archiveteam-bs
01:43 πŸ”— Swicher JAA: What I was worried about when downloading the site only with the Warrior/ArchiveBot is that it does not cover it completely (or at least not the relevant parts). For example, if you check https://archive.org/download/archiveteam_archivebot_go_20180818100001/addons.mozilla.org-inf-20180729-181049-xew9s-00052.warc.os.cdx.gz (the latest index work that I found referring to to the Mozilla site) and compare it with the list that I made you will see tha
01:43 πŸ”— Swicher t many things are missing.
01:43 πŸ”— Swicher That's why I started doing the scripts already mentioned. The only problem I haven't been able to solve is that it takes between 5 days and a week to go through all the pages with extensions, so any idea to optimize this is welcome. Just out of curiosity, are you saving the list on your own or did you add it as an ArchiveBot job?
01:47 πŸ”— JAA Swicher: Yeah, that grab is incomplete. The one from 2017-08-29 until early December 2017 should be more complete.
01:48 πŸ”— JAA I'll add your list to ArchiveBot. I won't grab it myself since I'm working on a different grab method.
01:49 πŸ”— JAA For any further discussion about AMO, please come to the dedicated channel: #outofammo
01:50 πŸ”— Swicher Ok, and thanks for the tip.
01:58 πŸ”— JAA Swicher: ArchiveBot job submitted, ID akifc65k7kfhpdhfbveh79v1c. It won't start immediately though.
01:59 πŸ”— closure_ has quit IRC (Read error: Operation timed out)
02:05 πŸ”— godane so i just learn why people like CHD files for isos
02:06 πŸ”— godane they save tons of space
02:06 πŸ”— godane about at least 50%
02:06 πŸ”— closure has joined #archiveteam-bs
02:12 πŸ”— godane i'm learning this cause i'm downloading the playstation official magazine demo discs from archive.org
02:29 πŸ”— godane so i got a ton of ps2 demo discs that may go up if there not hosted on archive.org already
02:30 πŸ”— godane i have rip at least 3 discs anyways
02:30 πŸ”— godane i have 2 psm dvds which are just video dvds
02:30 πŸ”— godane and a pc tools cd
02:31 πŸ”— closure has quit IRC (Read error: Operation timed out)
02:32 πŸ”— godane luckly there is this but no playstation official magazine ps2 discs
02:32 πŸ”— godane https://archive.org/download/PlayStation2-Demos
02:34 πŸ”— closure has joined #archiveteam-bs
02:42 πŸ”— kiska JAA: The A Listing for tian.yam.com: https://transfer.sh/VHO5D/tian.yam.com-fdns-a-listing
02:42 πŸ”— kiska I am going to try the any dataset
02:45 πŸ”— JAA kiska: T minus 23 hours 15 minutes
02:45 πŸ”— JAA 25*
02:46 πŸ”— kiska btw using pigz, and its halving the lookup time
02:46 πŸ”— JAA Based on their description, I doubt there'll be more in the ANY set.
02:48 πŸ”— kisspunch It's been about a year, here's an update on storage prices: https://za3k.com/archive/storage-2018-10.sc.txt
02:49 πŸ”— JAA "thermal paper" lol
02:52 πŸ”— kiska JAA: Here is the any listing data: https://transfer.sh/vRZ7v/tian.yam.com-fdns-any-listing
02:54 πŸ”— JAA kiska: Yup, identical hostnames (except for those yamedia.tw ones).
02:56 πŸ”— JAA I feel like we won't be able to save pretty much anything from this site.
02:56 πŸ”— kiska Hrm, lets try cname resolves
02:58 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
03:00 πŸ”— closure has joined #archiveteam-bs
03:01 πŸ”— kiska I've submitted a request for the latest dns dataset, but I doubt I'll get a response soonβ„’
03:13 πŸ”— ivan kisspunch: 8TB has been $160 for a while assuming you're OK with stripping He8 warranties and have the know-how to pry them out of the My Book/easystore cases
03:14 πŸ”— ivan oh I see the second link to a Seagate
03:14 πŸ”— ivan I will never buy an SMR
03:16 πŸ”— JAA There's nothing wrong with SMR when used for the right purpose. (But Seagate's doing a terrible job at communicating those limitations properly.)
03:16 πŸ”— JAA But -ot
03:20 πŸ”— kiska JAA: Another scrape of dns data: https://transfer.sh/wXvEp/tian.yam.com-subdomains-securitytrails
03:20 πŸ”— kiska There is about ~100 subdomains in that
03:32 πŸ”— Flashfire Elon musk forced to resign
03:32 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
03:34 πŸ”— closure has joined #archiveteam-bs
03:35 πŸ”— jrwr Yep
03:38 πŸ”— archodg_ has joined #archiveteam-bs
03:40 πŸ”— archodg__ has quit IRC (Ping timeout: 252 seconds)
03:40 πŸ”— odemg has quit IRC (Ping timeout: 260 seconds)
03:48 πŸ”— kiska JAA: So do we want to start archiving what we already have? I think I can get more subdomains to process, but it looks like a very manual process, since their search is using POST data
03:53 πŸ”— odemg has joined #archiveteam-bs
03:56 πŸ”— BlueMax has quit IRC (Remote host closed the connection)
03:58 πŸ”— closure has quit IRC (Read error: Operation timed out)
03:59 πŸ”— BlueMax has joined #archiveteam-bs
03:59 πŸ”— closure has joined #archiveteam-bs
04:07 πŸ”— Mateon1 has quit IRC (Ping timeout: 268 seconds)
04:07 πŸ”— Mateon1 has joined #archiveteam-bs
04:09 πŸ”— ReimuHaku has quit IRC (Ping timeout: 633 seconds)
04:10 πŸ”— ReimuHaku has joined #archiveteam-bs
04:35 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
04:40 πŸ”— closure has joined #archiveteam-bs
04:55 πŸ”— fenn_ is now known as fenn
04:59 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
05:00 πŸ”— closure has joined #archiveteam-bs
05:32 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
07:20 πŸ”— ivan has quit IRC (Read error: Operation timed out)
07:20 πŸ”— JAA has quit IRC (Read error: Operation timed out)
07:20 πŸ”— Frogging has quit IRC (Read error: Operation timed out)
07:20 πŸ”— Frogging has joined #archiveteam-bs
07:20 πŸ”— Petri152 has quit IRC (Read error: Operation timed out)
07:20 πŸ”— zyphlar has quit IRC (Read error: Operation timed out)
07:21 πŸ”— Darkstar has quit IRC (Read error: Operation timed out)
07:21 πŸ”— jspiros has quit IRC (Read error: Operation timed out)
07:21 πŸ”— nightpool has quit IRC (Read error: Operation timed out)
07:21 πŸ”— Swicher has quit IRC (hub.efnet.us irc.Prison.NET)
07:21 πŸ”— achip has quit IRC (hub.efnet.us irc.Prison.NET)
07:22 πŸ”— c4rc4s has quit IRC (Read error: Operation timed out)
07:23 πŸ”— ivan has joined #archiveteam-bs
07:25 πŸ”— Mayonaise has quit IRC (Read error: Operation timed out)
07:25 πŸ”— Mayonaise has joined #archiveteam-bs
07:30 πŸ”— Darkstar has joined #archiveteam-bs
07:31 πŸ”— nightpool has joined #archiveteam-bs
07:32 πŸ”— achip has joined #archiveteam-bs
07:32 πŸ”— Swicher has joined #archiveteam-bs
07:47 πŸ”— schbirid has joined #archiveteam-bs
08:10 πŸ”— m007a83_ has joined #archiveteam-bs
08:15 πŸ”— jrwr_ has joined #archiveteam-bs
08:16 πŸ”— m007a83 has quit IRC (Read error: Operation timed out)
08:16 πŸ”— thejsa_ has joined #archiveteam-bs
08:18 πŸ”— Flashfire I NEED THE BEST WAY TO SCAN MAGAZINES NOW
08:18 πŸ”— Flashfire my parents want to throw out a bunch of old magazines and wont let me ship them out
08:19 πŸ”— Flashfire Godane
08:19 πŸ”— Flashfire Sketchcow
08:20 πŸ”— thejsa has quit IRC (Ping timeout: 633 seconds)
08:20 πŸ”— jrwr has quit IRC (Ping timeout: 633 seconds)
08:20 πŸ”— Jon- has quit IRC (Ping timeout: 633 seconds)
08:20 πŸ”— jrwr_ is now known as jrwr
08:20 πŸ”— jmtd has joined #archiveteam-bs
08:20 πŸ”— HCross ivan: https://www.ebuyer.com/771467-seagate-backup-plus-hub-8tb-external-hard-drive-stel8000200 this sort of thing?
08:21 πŸ”— c4rc4s has joined #archiveteam-bs
08:21 πŸ”— zyphlar has joined #archiveteam-bs
08:21 πŸ”— Petri152 has joined #archiveteam-bs
08:22 πŸ”— JAA has joined #archiveteam-bs
08:22 πŸ”— swebb sets mode: +o JAA
08:22 πŸ”— bakJAA_ sets mode: +o JAA
08:23 πŸ”— schbirid HCross ivan be aware that those are SMR drives. at least in germany wd mybook pros are at a similar price every now and then
08:23 πŸ”— HCross Dont worry, you can get a Porsche HDD https://www.ebuyer.com/767035-lacie-porsche-design-4tb-usb-3-0-desktop-drive-stew4000400
08:24 πŸ”— schbirid :D
08:26 πŸ”— jspiros has joined #archiveteam-bs
08:26 πŸ”— Flashfire Godane do you want some collectors mags?
08:26 πŸ”— w0rmhole flashfire: a scanner. maybe you could see if you could use a friend's or if a photocopy service will do it. if push comes to shove, you could lay them out flat or cut the pages out with an exacto knife and take photos of the pages\
08:27 πŸ”— HCross Library?
08:27 πŸ”— Flashfire Library?
08:27 πŸ”— Flashfire No these are from a dvd collection from about 10 15 years ago
08:27 πŸ”— HCross Flashfire: would your local library have a scanner you could use?
08:28 πŸ”— Flashfire I dont think so
08:30 πŸ”— Flashfire Archive.org doesnt have them but auspost charges an arm and a leg
09:01 πŸ”— RichardG has quit IRC (Read error: Connection reset by peer)
09:02 πŸ”— RichardG has joined #archiveteam-bs
09:11 πŸ”— Jusque has quit IRC (Ping timeout: 260 seconds)
09:11 πŸ”— Jusque has joined #archiveteam-bs
09:29 πŸ”— kevinYang a PHP script to collect usernames with tian's search feature: https://transfer.sh/kNTuz/tian_fetch.php
09:30 πŸ”— kiska Yay!
09:31 πŸ”— kiska I have ~610 urls, are you able to run the script and give me unduplicated urls?
09:32 πŸ”— kiska Here is my list of url's: https://pastebin.com/raw/rP4taKiW
11:03 πŸ”— eientei95 kevinYang:
11:03 πŸ”— eientei95 PHP Warning: mysqli_connect(): (HY000/1049): Unknown database 'tian_username' in /tian_fetch.php on line 4
11:03 πŸ”— eientei95 PHP Warning: mysqli_query() expects parameter 1 to be mysqli, boolean given in /tian_fetch.php on line 5
11:04 πŸ”— eientei95 ...
11:04 πŸ”— eientei95 nvm
11:05 πŸ”— eientei95 It keeps the <br> next to the username tho
11:07 πŸ”— eientei95 Ah, here's a real bug
11:07 πŸ”— eientei95 PHP Notice: ob_flush(): failed to flush buffer. No buffer to flush in /tian_fetch.php on line 46
11:11 πŸ”— eientei95 kiska: +159 from 200 usernames scraped using kevinYang's script
11:11 πŸ”— kiska Pastebin?
11:12 πŸ”— kiska There should be ~10-20k blogs, since I believe yam is a pretty big host of them
11:16 πŸ”— eientei95 kiska: https://pastebin.com/raw/kL6A8qMF
11:16 πŸ”— eientei95 Also you got a couple dupes in yours, plus the CDN URL
11:17 πŸ”— kiska Hrm...
11:18 πŸ”— kiska It should skip them during the grab
11:19 πŸ”— kiska Also queued
11:30 πŸ”— decay has joined #archiveteam-bs
12:09 πŸ”— VerifiedJ has joined #archiveteam-bs
12:19 πŸ”— BlueMax has quit IRC (Quit: Leaving)
12:24 πŸ”— wp494 has quit IRC (Ping timeout: 506 seconds)
12:25 πŸ”— wp494 has joined #archiveteam-bs
12:30 πŸ”— kevinYang Here's ~20k unduplicated usernames sorted by their blogger_id: https://pastebin.com/raw/UJyRYxdB . Maybe there's an API to query usernames with blogger_id?
12:31 πŸ”— kiska Hrm... That is... useful
12:31 πŸ”— kevinYang I think there would be ~200k blogs since post IDs are up to 200M and total population of Taiwan is 23M.
12:37 πŸ”— bakJAA_ is now known as bakJAA
12:51 πŸ”— kiska JAA: Some useful information here
13:03 πŸ”— closure has joined #archiveteam-bs
13:32 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
13:32 πŸ”— closure has joined #archiveteam-bs
13:38 πŸ”— HCross http://c.hawc.eu/tianyamusers.txt
13:39 πŸ”— HCross full URL list absed on what kevinYang sent
13:51 πŸ”— m007a83_ has quit IRC (Quit: Fuck you Comcast)
13:55 πŸ”— zerkalo has joined #archiveteam-bs
13:59 πŸ”— closure has quit IRC (Read error: Operation timed out)
14:00 πŸ”— closure has joined #archiveteam-bs
14:32 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
14:34 πŸ”— closure has joined #archiveteam-bs
14:58 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
14:58 πŸ”— Pixi has quit IRC (Quit: Pixi)
15:01 πŸ”— schbirid has quit IRC (Remote host closed the connection)
15:06 πŸ”— closure has joined #archiveteam-bs
15:08 πŸ”— Pixi has joined #archiveteam-bs
15:22 πŸ”— kiska kevinYang, you join #archivebot I'll be queuing 200 urls per ~30 minutes. Hopefully we'll be able to get a significant amount of those
15:32 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
15:33 πŸ”— closure_ has joined #archiveteam-bs
15:58 πŸ”— closure_ has quit IRC (Read error: Connection reset by peer)
15:59 πŸ”— closure has joined #archiveteam-bs
16:32 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
16:36 πŸ”— closure has joined #archiveteam-bs
17:00 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
17:00 πŸ”— closure_ has joined #archiveteam-bs
17:33 πŸ”— closure has joined #archiveteam-bs
17:33 πŸ”— closure_ has quit IRC (Read error: Connection reset by peer)
17:51 πŸ”— icedice has joined #archiveteam-bs
17:52 πŸ”— jut_ has quit IRC (Quit: WeeChat 2.2)
18:00 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
18:00 πŸ”— closure_ has joined #archiveteam-bs
18:07 πŸ”— jut has joined #archiveteam-bs
18:33 πŸ”— closure_ has quit IRC (Read error: Connection reset by peer)
18:33 πŸ”— closure has joined #archiveteam-bs
18:38 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
18:40 πŸ”— closure has joined #archiveteam-bs
19:03 πŸ”— closure has quit IRC (Read error: Operation timed out)
19:06 πŸ”— closure has joined #archiveteam-bs
19:34 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
19:34 πŸ”— closure_ has joined #archiveteam-bs
20:00 πŸ”— closure_ has quit IRC (Ping timeout: 252 seconds)
20:00 πŸ”— closure has joined #archiveteam-bs
20:03 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
20:04 πŸ”— closure has joined #archiveteam-bs
20:32 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
20:35 πŸ”— closure_ has joined #archiveteam-bs
21:01 πŸ”— closure_ has quit IRC (Read error: Connection reset by peer)
21:01 πŸ”— closure has joined #archiveteam-bs
21:33 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
21:34 πŸ”— closure has joined #archiveteam-bs
21:59 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
22:00 πŸ”— closure_ has joined #archiveteam-bs
22:33 πŸ”— closure_ has quit IRC (Read error: Connection reset by peer)
22:33 πŸ”— closure has joined #archiveteam-bs
23:00 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
23:01 πŸ”— closure has joined #archiveteam-bs
23:07 πŸ”— VerifiedJ has quit IRC (Quit: Leaving)
23:34 πŸ”— closure has quit IRC (Read error: Connection reset by peer)
23:34 πŸ”— closure_ has joined #archiveteam-bs
23:57 πŸ”— BlueMax has joined #archiveteam-bs
23:58 πŸ”— closure_ has quit IRC (Read error: Operation timed out)

irclogger-viewer