#archiveteam-bs 2016-08-12,Fri

↑back Search

Time Nickname Message
00:00 🔗 Frogging http://bettermotherfuckingwebsite.com/
00:01 🔗 Ravenloft browsing with via a cell phone connection is a pain, even if it is much faster than it was back in that time
00:01 🔗 Ravenloft if only people still care enough to make sites lightweight
00:06 🔗 MrRadar has joined #archiveteam-bs
00:19 🔗 DoomTay has joined #archiveteam-bs
00:40 🔗 BlueMaxim has joined #archiveteam-bs
01:12 🔗 schbirid2 has joined #archiveteam-bs
01:14 🔗 schbirid has quit IRC (Read error: Operation timed out)
01:34 🔗 dashcloud has quit IRC (Remote host closed the connection)
01:36 🔗 dashcloud has joined #archiveteam-bs
01:45 🔗 Stiletto has quit IRC (Ping timeout: 246 seconds)
01:47 🔗 kristian_ has quit IRC (Leaving)
02:56 🔗 SketchCow I go "Why is it so hard for me to go through the FTP uploads to ingest into archive?"
02:56 🔗 SketchCow And the answers are:
02:56 🔗 SketchCow - No context of what I'm looking at
02:56 🔗 SketchCow - Crap that is obviously going to be taken down within milliseconds
02:56 🔗 SketchCow - Zero metadata
02:56 🔗 SketchCow - Drag and Drop wonderment of "well, I'm done working on this, give it to jason"
02:57 🔗 SketchCow So there we go.
02:58 🔗 xmc yep, makes sense
03:00 🔗 SketchCow Some stuff has been in there for north of a year.
03:00 🔗 SketchCow Time to get mean.
03:01 🔗 SketchCow And since this channel is apparently able to sustain the profound retardation of DoomTay, it can handle me open-calling the stuff I'm seeing on the FTP page, and going from there.
03:02 🔗 SketchCow First up, 3D Lemmings CD-ROM I had. Easy to do.
03:04 🔗 SketchCow Next, "Bally Alley", another of mine.
03:04 🔗 SketchCow I see. These are a bunch of one-page letters between Bally developers.
03:04 🔗 SketchCow And other sets.
03:06 🔗 SketchCow https://archive.org/details/ballyalley?and[]=bally%20alley
03:06 🔗 SketchCow I'm going to combine all the letters into one object.
03:07 🔗 SketchCow archive.org/details/
03:07 🔗 SketchCow Various_Bally_Developer_Related_Letters
03:11 🔗 SketchCow OK, the rest are letters that I'm more than happy to deal with in this fashion.
03:11 🔗 SketchCow So they're getting uploaded now, and will be in ballyalley
03:15 🔗 xmc b-alley
03:17 🔗 Stiletto has joined #archiveteam-bs
03:20 🔗 SketchCow OK, they're all in.
03:21 🔗 SketchCow https://archive.org/details/ballyalley?sort=-publicdate will show them populating as they go in.
03:21 🔗 SketchCow NEXT
03:21 🔗 SketchCow "Cinemageddon"
03:22 🔗 SketchCow So, basically, someone is robin-hooding me movies from a tracker.
03:23 🔗 SketchCow Oh, and magazines.
03:23 🔗 SketchCow OK, well, magazines first.
03:23 🔗 SketchCow SCREEN# for each in *.pdf
03:23 🔗 SketchCow > do
03:23 🔗 SketchCow > DELETE=1 /0/SCRIPTCITY/appleway "$each"
03:23 🔗 SketchCow > done
03:27 🔗 hook54321 SketchCow: "someone"?
03:27 🔗 DoomTay I thought the rule was "one thing=one item" or something like that
03:30 🔗 xmc generally.
03:38 🔗 DoomTay This might not actually become relevant until the fall, but let's say I find a scanner with OCR and I want to use it to scan the magazine
03:39 🔗 DoomTay With the intention of uploading that scan to archive.org
03:39 🔗 DoomTay Would it be better to use the scanner's OCR, or let archive.org do the OCRing
03:59 🔗 hook54321 Do we know what kind of software archive.org uses for OCR?
04:04 🔗 SketchCow Screen magazine uploads still going.
04:04 🔗 SketchCow (Lot of issues)
04:05 🔗 DoomTay "abbyy finereader 8.0" https://archive.org/post/386344/how-to-use-ocr-on-this-site
04:06 🔗 hook54321 Oh. I've used ABBYY before, from what I've seen it's pretty good.
04:06 🔗 SketchCow https://archive.org/details/magazine_rack?sort=-publicdate
04:06 🔗 SketchCow Screen's showing up
04:07 🔗 DoomTay Meanwhile, the scanner at university I will be using also has OCR, but I don't know what software it uses or how it compares
04:10 🔗 hook54321 Do you know what kind of scanner it is?
04:11 🔗 DoomTay Only that it's a KIC book scanner
04:16 🔗 xmc abbyy is some of the best available ocr software
04:17 🔗 DoomTay In that case, my choice is clear
04:27 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:32 🔗 Stiletto has quit IRC (Ping timeout: 246 seconds)
04:33 🔗 Sk1d has joined #archiveteam-bs
04:46 🔗 Stiletto has joined #archiveteam-bs
04:49 🔗 hook54321 If you're scanning something with Chinese characters you might be better off doing the OCR yourself.
04:52 🔗 Frogging one does not simply type chinese
04:53 🔗 DoomTay Ha, I won't have to worry about that. These are all English language magazines
04:53 🔗 Frogging I guess you could use one of those applications to draw them
04:53 🔗 Frogging it'd be super tedious :p
04:56 🔗 yipdw of course you can type in Chinese or Japanese; that's what pinyin/romaji and IMEs are for
04:56 🔗 yipdw there's also plenty of OCR software that recognizes hanzi/kanji
04:58 🔗 Frogging yipdw: yeah, I know about the phonetic typing. But it wouldn't work if you can't read the characters
04:58 🔗 Frogging in order to type them phonetically, that is
04:58 🔗 yipdw always possible to learn
04:58 🔗 yipdw also OCR
04:58 🔗 yipdw there's also the SKIP method which can be handy
04:59 🔗 yipdw and http://tsukurimashou.osdn.jp/idsgrep.php.en
05:01 🔗 SketchCow OK, screen is REALLY showing up
05:01 🔗 SketchCow I'll do the covers later.
05:01 🔗 SketchCow Back to more cinemageddon
05:01 🔗 SketchCow I'm going to kill this FTP
05:02 🔗 SketchCow Cinemageddon# for each in *
05:02 🔗 SketchCow > do
05:02 🔗 SketchCow > /0/SCRIPTCITY/cdway "$each"
05:02 🔗 SketchCow > done
05:02 🔗 SketchCow Pumping in three movies from Cinemageddon
05:02 🔗 SketchCow Probaby doomed, down within a week
05:02 🔗 SketchCow 3gb of movies
05:04 🔗 hook54321 SketchCow: how are you finding these FTP servers?
05:04 🔗 hook54321 :P
05:04 🔗 Frogging there's a project
05:04 🔗 SketchCow No, this is not "FTP servers"
05:05 🔗 SketchCow This is "I have an FTP site that people can upload to to have me ingest them into the archive" combined with "Some people have done a good job uploading items in a clear anc concise fashion and others have basically dropped stuff into a disorganized shitpile"
05:05 🔗 SketchCow With a twist of "Fuck it, I will not sleep tonight until I murder this collection"
05:06 🔗 Frogging nice :p
05:06 🔗 hook54321 Would you rather have a "disorganized shitpile" or less stuff?
05:07 🔗 Frogging How about people put a bit of effort and label things properly
05:07 🔗 Frogging that too is an option isn't it
05:08 🔗 hook54321 i guess
05:08 🔗 hook54321 do people ever upload copyrighted stuff into the FTP server?
05:09 🔗 SketchCow Oh you are adorable
05:09 🔗 hook54321 huh?
05:09 🔗 SketchCow < hook54321> Would you rather have a "disorganized shitpile" or less stuff?
05:09 🔗 SketchCow This is how people get abusive partners, by the way
05:10 🔗 SketchCow OK, Cinemageddon stuff uploaded.
05:10 🔗 SketchCow NEXT
05:10 🔗 hook54321 SketchCow: people get abusive partners by uploading shit into an FTP server? :P
05:11 🔗 Frogging whoosh
05:11 🔗 SketchCow Yeah
05:11 🔗 Frogging bedtime for me
05:11 🔗 SketchCow It's OK, I don't need validation.
05:12 🔗 hook54321 ytho.jpg
05:14 🔗 SketchCow NEXT
05:14 🔗 SketchCow "DPRK Stuff"
05:15 🔗 yipdw is there some stuff on the Kwangmyong in that heap
05:15 🔗 yipdw that'd be badass
05:16 🔗 SketchCow What it APPEARS to be is a disorganized pile of shit.
05:16 🔗 SketchCow It's been sitting in this directory since January
05:16 🔗 HCross2 yipdw: I've got videos of dancing North Korean soldiers somewhere
05:16 🔗 SketchCow -rw------- 1 wacko wacko 6469990 Dec 28 2015 cds.zip
05:16 🔗 SketchCow -rw------- 1 wacko wacko 7676 Dec 28 2015 certs.zip
05:16 🔗 SketchCow -rw------- 1 wacko wacko 62790054 Sep 21 2014 daesong_towel.rar
05:16 🔗 SketchCow -rw------- 1 wacko wacko 487424 Sep 21 2014 ftp.doc
05:16 🔗 SketchCow -rw------- 1 wacko wacko 467984104 Dec 28 2015 gol.zip
05:16 🔗 SketchCow -rw------- 1 wacko wacko 226936053 Sep 21 2014 item_2.zip
05:16 🔗 SketchCow -rw-r--r-- 1 root root 21 May 26 2015 item_2.zip.txt
05:16 🔗 SketchCow -rw------- 1 wacko wacko 21037552 Sep 21 2014 kiyctc.zip
05:16 🔗 SketchCow -rw-r--r-- 1 root root 42466 May 26 2015 kiyctc.zip.txt
05:16 🔗 SketchCow -rw------- 1 wacko wacko 190081047 Sep 21 2014 korfilm.zip
05:17 🔗 SketchCow -rw-r--r-- 1 root root 22 May 26 2015 korfilm.zip.txt
05:17 🔗 SketchCow -rw------- 1 wacko wacko 1161564 Sep 21 2014 naenara_usertable.rar
05:17 🔗 SketchCow -rw------- 1 wacko wacko 16839853 Sep 21 2014 rodong.zip
05:17 🔗 SketchCow -rw-r--r-- 1 root root 53681 May 26 2015 rodong.zip.txt
05:17 🔗 SketchCow -rw------- 1 wacko wacko 24090843 Sep 21 2014 vok_and_gnu.zip
05:17 🔗 SketchCow It's one gig of material.
05:17 🔗 yipdw oh great
05:17 🔗 DoomTay Fun
05:17 🔗 SketchCow DPRK_stuff# for each in *.zip; do unzip -l $each >${each}.txt; done
05:17 🔗 SketchCow End-of-central-directory signature not found. Either this file is not
05:17 🔗 SketchCow a zipfile, or it constitutes one disk of a multi-part archive. In the
05:17 🔗 SketchCow latter case the central directory and zipfile comment will be found on
05:17 🔗 SketchCow the last disk(s) of this archive.
05:17 🔗 SketchCow unzip: cannot find zipfile directory in one of item_2.zip or
05:17 🔗 SketchCow item_2.zip.zip, and cannot find item_2.zip.ZIP, period.
05:17 🔗 SketchCow End-of-central-directory signature not found. Either this file is not
05:17 🔗 SketchCow a zipfile, or it constitutes one disk of a multi-part archive. In the
05:17 🔗 SketchCow latter case the central directory and zipfile comment will be found on
05:17 🔗 SketchCow the last disk(s) of this archive.
05:17 🔗 SketchCow unzip: cannot find zipfile directory in one of korfilm.zip or
05:18 🔗 SketchCow korfilm.zip.zip, and cannot find korfilm.zip.ZIP, period.
05:18 🔗 SketchCow So three of the zips are bad
05:18 🔗 SketchCow This is what has slowed me up before.
05:18 🔗 SketchCow So fuck it. They die.
05:18 🔗 hook54321 DDOS them to death
05:19 🔗 SketchCow root@teamarchive0:/0/CDROMS/DPRK_stuff# grep -i Kwangmyong *.txt
05:19 🔗 SketchCow Nothin
05:19 🔗 yipdw hm
05:19 🔗 yipdw oh well
05:19 🔗 hook54321 isn't there a way to sometimes read bad zip files?
05:19 🔗 SketchCow Probably.
05:19 🔗 SketchCow Not going to do it
05:20 🔗 SketchCow This isn't some mysterious found .zip file on the bottom of a trunk
05:20 🔗 SketchCow This is somebody who uploaded this shit to me and did it wrong.
05:20 🔗 SketchCow You_had_one_job.gif
05:21 🔗 SketchCow /0/SCRIPTCITY/cdway "DPRK_Stuff_-_Web_Material_From_North_Korean_Sites"
05:21 🔗 SketchCow OK, doing it the hard way with DPRK_Stuff_-_Web_Material_From_North_Korean_Sites.
05:21 🔗 SketchCow What collection does this dump into?
05:21 🔗 SketchCow web
05:22 🔗 SketchCow What type of item is this? (texts, software, movies, audio...) (texts is default)
05:22 🔗 SketchCow data
05:22 🔗 SketchCow We're doing this the hard way.
05:22 🔗 SketchCow We're putting this into web.
05:22 🔗 SketchCow Woot.
05:22 🔗 SketchCow Going to use the filename DPRK_Stuff_-_Web_Material_From_North_Korean_Sites...
05:22 🔗 SketchCow There we go.
05:22 🔗 hook54321 http://memedad.com/memes/951513.jpg
05:26 🔗 hook54321 http://memedad.com/memes/951519.jpg
05:29 🔗 SketchCow DPRK done
05:31 🔗 SketchCow Now "Floppy Images"
05:31 🔗 SketchCow Includes notes, references an e-mail. Can't find e-mail
05:33 🔗 SketchCow Uploading it as is.
05:33 🔗 SketchCow Not really useful.
05:34 🔗 SketchCow Floppy_Disks_Collection_Various_Batch_One:
05:34 🔗 SketchCow uploading Floppy_Disks_Collection_Various_Batch_One/ERM - Goofy's Express (Copy).img: [################################] 2/2 - 00:00:00
05:34 🔗 SketchCow uploading Floppy_Disks_Collection_Various_Batch_One/MOD Sound Files #2.img: [################################] 2/2 - 00:00:00
05:34 🔗 SketchCow uploading Floppy_Disks_Collection_Various_Batch_One/AU Format Sound 1.img: [################################] 2/2 - 00:00:00
05:34 🔗 SketchCow uploading Floppy_Disks_Collection_Various_Batch_One/Batch One Notes.txt: [################################] 1/1 - 00:00:00
05:34 🔗 SketchCow uploading Floppy_Disks_Collection_Various_Batch_One/17th Annual Triad Competition Logo and Schedule.img: [################################] 2/2 - 00:00:00
05:34 🔗 SketchCow uploading Floppy_Disks_Collection_Various_Batch_One/MOD Sound Files #1.img: [################################] 2/2 - 00:00:00
05:34 🔗 SketchCow uploading Floppy_Disks_Collection_Various_Batch_One/ERM - Corrupt Unbranded White Floppy (Corrupt #1).img: [################################] 1/1 - 00:00:00
05:34 🔗 SketchCow uploading Floppy_Disks_Collection_Various_Batch_One/ERM - ICLOGO WMF Picture File.img: [################################] 2/2 - 00:00:00
05:36 🔗 SmileyG has quit IRC (Read error: Operation timed out)
05:41 🔗 Smiley has joined #archiveteam-bs
05:46 🔗 DoomTay has quit IRC (Quit: Page closed)
05:51 🔗 SketchCow NEXT
05:51 🔗 SketchCow Podcasts
05:51 🔗 SketchCow Just figured out how I uploaded these, doing it so if it's already done it, it won't upload it, fixing it.
05:52 🔗 SketchCow https://archive.org/details/2005_podcastcoresample
05:52 🔗 SketchCow These are all items underneath
05:58 🔗 SketchCow https://archive.org/details/2005_podcastcoresample?sort=-publicdate
06:08 🔗 SketchCow Oh, it's going to be that for a while (two screens working on it)
06:08 🔗 SketchCow Going to open third screen
06:09 🔗 SketchCow NEWTON_Argonne# ls -l
06:09 🔗 SketchCow total 80152
06:09 🔗 SketchCow -rw------- 1 wacko wacko 82072984 Feb 26 2015 www.newton.dep.anl.gov.tar.gz
06:09 🔗 SketchCow No idea what this is.
06:10 🔗 SketchCow Figured it out.
06:11 🔗 SketchCow This is all in the wayback, but someone grabbed it.
06:14 🔗 SketchCow In it goes
06:17 🔗 SketchCow NOT-FULLY-UPLOADED-beos_haiku_stuff
06:17 🔗 SketchCow Someone trying to be helpful
06:17 🔗 SketchCow But
06:21 🔗 SketchCow -rw-r--r-- 1 wacko wacko 1638111662 Dec 12 2015 SWG_Media.zip
06:21 🔗 SketchCow I see this is nothing but Star Wars Galaxies Material.
06:25 🔗 SketchCow 28G ftp.hp.com_2012_softpaq_archive.tar
06:25 🔗 SketchCow Going up
06:37 🔗 SketchCow Now... now we are getting somewhere.
06:41 🔗 SketchCow Dents, dents are being made
07:11 🔗 hook54321 I am a dent. O_O
07:36 🔗 BlueMaxim has quit IRC (Quit: Leaving)
07:51 🔗 Honno has joined #archiveteam-bs
08:14 🔗 SketchCow Uploading continues.
08:15 🔗 SketchCow I now have 8 windows uploading from this inbox into the Archive.
08:15 🔗 SketchCow EVR Radio dregs, Podcasts, Minecraft Sets, Youtube grabs
08:18 🔗 dashcloud has quit IRC (Read error: Operation timed out)
08:22 🔗 dashcloud has joined #archiveteam-bs
09:01 🔗 SketchCow Now, I am going to bed. I have uploaded many gigabytes. The inbox is beginning to make sense (but there's still a huge pile left to go.)
09:21 🔗 godane SketchCow: The Mark Levin Show didin't need to be move to the podcast collection right away
09:21 🔗 godane also the Podcast Collection causes alot of the my items to be downloadable in less your login
09:32 🔗 godane even the kpfa podcast items are affected
09:44 🔗 BlueMaxim has joined #archiveteam-bs
09:47 🔗 godane SketchCow: btw the Electrical Workers will have to be getting its own collection: https://archive.org/search.php?query=subject%3A%22Electrical%20Workers%22%20uploader%3A%22slaxemulator%40gmail.com%22
09:49 🔗 godane whats funny is the pdfs you moved don't have the same problem that i'm having with the podcasts collection
10:26 🔗 kristian_ has joined #archiveteam-bs
12:14 🔗 davidar has joined #archiveteam-bs
12:48 🔗 BlueMaxim has quit IRC (Quit: Leaving)
14:56 🔗 SketchCow I have successfully flooded the incoming s3 queue for my accounts and have to wait for it to settle down!
14:57 🔗 SketchCow godane: Over time, the podcasts and magazine_rack collections will be gone through with a script and will give me or my scripts the option to find them and make collections for them.
14:58 🔗 SketchCow But moving it from "godane inbox" to "podcasts" or "magazine_rack" is at least a step in the right direction.
15:23 🔗 HCross2 My inbound queue is always full
15:24 🔗 godane ok
15:31 🔗 godane SketchCow: i also hope we can fix the items in the podcasts collections items to be downloadable
15:32 🔗 godane cause otherwise no one else can download them
15:36 🔗 godane so i'm up to 808k items
15:36 🔗 godane btw my inbound queue is always full too
16:05 🔗 DoomTay has joined #archiveteam-bs
16:48 🔗 Simpbrain has quit IRC (Read error: Operation timed out)
16:55 🔗 Simpbrain has joined #archiveteam-bs
16:59 🔗 SketchCow We are monsters
17:27 🔗 VADemon has joined #archiveteam-bs
17:54 🔗 dashcloud has quit IRC (Read error: Operation timed out)
17:57 🔗 dashcloud has joined #archiveteam-bs
18:11 🔗 dashcloud has quit IRC (Read error: Operation timed out)
18:14 🔗 dashcloud has joined #archiveteam-bs
18:22 🔗 schbirid2 great to hear that someone is feeding cinemageddon into IA. it is a treasure trove of movies and should be archived if possible
18:30 🔗 SketchCow It's OK to hear
18:30 🔗 SketchCow Someone piling endless directories into my FTP site with no metadata effort is not great
18:32 🔗 schbirid2 totally
18:32 🔗 schbirid2 they should have included the descriptions and imdb ids and everything else that the tracker offers
18:33 🔗 schbirid2 i wonder if someone at that site would support a project like that
18:54 🔗 Simpbrain has quit IRC (Remote host closed the connection)
19:02 🔗 godane schbind2: that was only a 8gb folder of Cinemageddon
19:02 🔗 godane also i thing the Screen pdfs are from Cinemagedom
19:05 🔗 godane https://archive.org/details/Pirated_Copy_Man_yan_2004
19:06 🔗 godane https://archive.org/details/scifibuzz_uk_sci-fi_special_1996.mkv
19:15 🔗 tomwsmf has joined #archiveteam-bs
19:26 🔗 godane i'm starting to upload my Google Books grab of PC Mag
19:31 🔗 godane https://archive.org/details/PC-Mag-1982-02
19:33 🔗 godane SketchCow: my idea is to release the google books version of PC Magazine so it could at some point be get 'fixes' for bad scan pages or missing pages to be scan for a proper release
19:34 🔗 godane if anything else we have the google version and maybe a proper re-release from it later
20:14 🔗 kristian_ has quit IRC (Leaving)
20:21 🔗 fie_ has quit IRC (Read error: Connection reset by peer)
20:47 🔗 RichardG_ has joined #archiveteam-bs
20:47 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
21:41 🔗 jk[SVP] has quit IRC (Ping timeout: 244 seconds)
21:41 🔗 jk[SVP] has joined #archiveteam-bs
21:52 🔗 RichardG_ is now known as RichardG
22:47 🔗 DoomTay has quit IRC (Quit: Page closed)
23:05 🔗 fie has joined #archiveteam-bs
23:31 🔗 Honno has quit IRC (Read error: Operation timed out)

irclogger-viewer